Integrating the ‘Chinese way of thinking’ with AI-CSST

Integrating the ‘Chinese way of thinking’ with AI

By Xu Wensheng, Wan Ju, Han Caihong / 12-14-2023 / Chinese Social Sciences Today

Recurrent neural network (RNN) and large language model (LLM) Photo: TUCHONG

The development and application of generative AI products such as ChatGPT herald the era of artificial intelligence-generated content (AIGC). AIGC is underpinned by multidimensional training data, abundant training resources, extensive training time, and algorithmic models with high performance. China’s AI industry needs to focus on the “Chinese way of thinking” that concerns the language structure of Chinese and its cultural history, as it opens up a wide range of scenarios for the development and application of AI.

With the increasing integration of AI into various industries, the demand for intelligent products and services is growing. Large-scale Chinese language corpora benefit China’s AI industry in terms of resource acquisition and data mining. They can not only facilitate the development of natural language processing (NLP) models and improve learning efficiency and outcomes, but can also help enhance the transparency of machine learning to build explainable AI.

R&D activities and innovation in AI technology can also benefit from the “Chinese way of thinking.” The organic integration of the “Chinese way of thinking” with AI will give fresh impetus to China’s AI industry, enabling it to deliver more AI products with Chinese characteristics and international competitiveness, and offer practical, intelligent services that meet users’ daily needs.

NLP typically involves natural language understanding and natural language generation. Both can be connected with the “Chinese way of thinking.” This necessitates a profound understanding of the linguistic and cultural background of the Chinese language, particularly the habits and psychological needs of its users. Chinese corpora built through collecting and organizing Chinese texts from different fields and contexts can provide massive language data for training models to be applied in machine translation, speech recognition, and other areas.

Machine translation basically involves tokenizing the source text so that it can be processed by the computer, and then using machine learning algorithms to learn from the tokenized input how to convert the source language into the target language. Translation involves not only the text and its grammatical structure but also the culture, history, and traditions behind the language. The Chinese language contains a wealth of collocations, idioms, and slang that cannot be literally translated into other languages in most cases.

At present, machine translation is still inadequate in several aspects, such as high-order thinking, precise understanding and expression, reproduction of the style and linguistic beauty of the source text, and inspiration. High-quality machine translation in the Chinese language requires a thorough consideration of the way of thinking of its users as well as rich background knowledge and cultural literacy.

Speech recognition technology is now widely used in smart homes, automotive navigation systems, voice assistants, and other products. It supports speech recognition and voice interaction in Chinese, and is likely to serve more purposes in the future.

Establishing a Chinese speech corpus will allow speech recognition systems to better understand and analyze Chinese speech signals, and including speech samples of fixed expressions such as idioms and proverbs in the corpus will make it easier for speech recognition systems to identify those expressions. The construction of such a corpus mainly involves the formulation of production standards, pre-collection and pre-assessment, formal collection, speech tokenization, compilation of an electronic pronunciation dictionary, and corpus evaluation and distribution.

To summarize, the high-quality development of AI in China should take into consideration the characteristics and needs of the “Chinese way of thinking.” Localized large language models can better serve the Chinese context and market, while also boosting the development of related technologies.

Xu Wensheng (professor) and Wanju are from Tongji University. Han Caihong (professor) is from Zhengzhou University of Science and Technology.

Edited by WANG YOURAN

PREV : Addressing the human risks of BCI

NEXT : Enhancing social resilience