Corpus linguistics offers new hot spot for academia

By By Hao Rihong / 04-23-2015 / (Chinese Social Sciences Today)

The emergence of buzzwords in recent years is closely linked to the development of corpus linguistics.

 

In recent years, numerous buzzwords have emerged in society and attracted great attention from Chinese linguists. This phenomenon has been closely linked with corpus linguistics, or the study of language through “real world” text. As a new research paradigm based on facts and data, corpus linguistics has strong practical value and broad prospects.


Sheng Yuqi, a professor from the School of Literature and Journalism at Shandong University, said the rise of corpus linguistics has been driven by “one turn” and “two revolutions.”


“The ‘one turn’ refers to major developments in Western philosophy during the early 20th century, the most important of which was the focus on the relationship between philosophy and language. The ‘two revolutions’ refers to information and linguistics. The former has converted the function of computers from data computing to information processing. The latter aims to find another research method distinguished from formal linguistics. Under the joint influence of these factors, corpus linguistics has emerged,” said Sheng.


Liu Hua, a professor from the College of Chinese Language and Literature at Jinan University in Guangzhou, attributed the emergence of corpus linguistics to changing academic trends and methodologies.


Since the mid-20th century, empirical research has become more dominant than rationalist research. Corpus linguistic studies can “make up” research shortfalls by widening study coverage and providing more accurate representation.
 

However, academia has tussled with the subject position of corpus linguistics since the discipline emerged. Some scholars argue that it is merely a linguistic research method, while others claim it is a critique of existing linguistic theories and proposes new ideas.
 

“In view of disciplinary orientation, the main characteristics of corpus linguistics are reflected in methodology. Nevertheless, it cannot be concluded that it is a discipline with special theories and philosophical foundation,” said Sheng.
 

Yao Shuangyun, a professor from the Research Center for Language and Linguistic Education at Central China Normal University in Wuhan, Hubei Province, said that corpus linguistics advocates research through mass data, further exploring the internal rules and operation mechanisms of language. “Corpus linguistics not only makes methodology breakthroughs, but also verifies, modifies, supplements and perfects current theories,” said Yao.
 

Traditional linguistic research pays more attention to material collection, classification and ordination, and word-by-word indexing. Corpus linguistics realizes the indexing of each character by utilizing computer technology.
 

Sheng said corpus linguistics explores the correlation between characters and helps researchers avoid tedious research to focus more on the exploration of little-known internal rules.
 

Apart from ontology research related to word frequency, vocabulary, syntax, discourse and register of language, further interdisciplinary studies reflect the practical value and broad prospects of corpus linguistics.
 

“These interdisciplinary studies include research into language life based on multimedia, social computing and online public opinion studies based on big data, and sociological and anthropological studies of materials from overseas Chinese. All of them combine corpus linguistics and traditional disciplines including sociology, anthropology, pedagogy and journalism,” said Liu.
 

Yao added that corpus linguistics is also conducive to natural language processing, which strongly supports word segmentation and tagging, knowledge excavation, machine translation as well as speech recognition and synthesis.
 

While acknowledging the development of corpus linguistics over the last 50 years, scholars note it still needs to be improved in the future.
 

“Corpus linguistics endeavors to verify current conclusions through various linguistic phenomena and data, reaching new conclusions based on empirical analysis. However, researchers are still accustomed to using the traditional linguistic classification system, which is adverse to the processing of natural language,” said Yao, stressing the need for a more reasonable linguistic classification system.


Sheng noted that sharing linguistic materials is directly linked to national discourse power. “We should advance corpus linguistics for the purpose of enhancing national soft power, giving full play to the advantage of the government in planning, integrating and optimizing various corpuses,” he said.
 

Liu said it is also necessary to develop more tools for assisting corpus linguistics to allow researchers to more easily utilize related statistical methods.

 

Hao Rihong is a reporter at the Chinese Social Sciences Today.