Generative AI sheds new light on historical studies
The “Twenty-Four Histories” display at the Chinese Archaeological Museum in Beijing. Photo: Fang Ke/CSST
Generative artificial intelligence (AI) technology has progressed rapidly in recent years, profoundly impacting both technological and commercial domains. New technologies are providing researchers with advanced experiences such as large-scale language models, pre-training, and interactive responses. Historians are now exploring new technologies, which are poised to significantly enhance the development of historical databases and upgrade application methods. While these technologies open up new avenues for research, hey also pose various risks and challenges that scholars should examine from a theoretical perspective.
Construction and application of historical databases
Historians have been using digital databases for decades. For modern researchers, keywords are the primary method used to retrieve historical materials. In China, the development of such databases is booming. Despite the diverse nature of materials and the targeted research communities covered by these databases, most share similar traits. They first integrate historical materials such as ancient texts, inscriptions, and folk documents in text or image form, establish catalogs, and make them searchable through browsing or keyword retrieval. Researchers manually select results and proceed with subsequent research. Due to the relatively limited reading and retrieval methods, the quality of database applications largely depends on whether researchers use appropriate keywords in their search.
As a result, databases play a passive role in providing text and basic retrieval functions, and most information and logical relationships that cannot be seen through keyword retrieval remain “dormant.” Critics of retrieval-based research point out its shortcomings: it prioritizes quantity over quality, accumulation over understanding, and separates keywords from context and historical background.
When we incorporate generative AI products into databases, and pre-train them using research literature relevant to the field before making these databases public, we greatly enhance the analytical capabilities of databases. This enhancement would enable AI to provide logical answers and increasingly intelligent analysis as usage increases. This gives databases an enhanced, proactive role in historical research, allowing them to better discover data truly relevant to researchers’ needs from vast datasets, understand the relationships between researcher instructions and data content, and provide researchers with more personalized and in-depth analysis results.
Scholars can use the new type of database not only as a general keyword search tool but also as an advanced research assistant capable of engaging in question-and-answer interactions—a groundbreaking development for historical research. One major feature of generative AI is its significant enhancement of the model’s ability to analyze and interpret natural language.
Text within a database can now be understood by the AI model, and going forward, researchers can issue instructions to the database through natural language rather than programming languages, significantly lowering the application threshold of new technologies. During interactive Q&A sessions with databases, researchers need to describe their demands as accurately as possible to obtain hierarchical and systematic answers. The Q&A process also helps artificial intelligence learn user thought processes, improve analytical conclusions, and provide better feedback. Interactive Q&A transforms the process of retrieving historical information from databases into a comprehensive research scenario that combines information extraction, academic discussion, and logical verification into “three-in-one” human-machine interactive academic research.
Promoting positive transformation in research topics
Generative AI can quickly collect and analyze vast amounts of textual data and generate answers in natural language. Once models have been sufficiently trained, many tasks such as textual emendation and interpretation may be performed more swiftly and accurately by AI than by human scholars. For instance, research teams both in China and abroad have begun using machine learning techniques to collate and interpret oracle bone inscriptions and papyrus documents. The barriers of narrow specialization for such work will be eliminated by AI, making the analytical work of human scholars no longer indispensable. This doesn’t mean scholars can abandon their interpretative authority and “lie flat,” although the weight of critical calls for authentication will inevitably diminish, and the compilation and summary of historical materials in a “scissors and paste” manner will disappear. Scholars urgently need to actively transform their research topics, driving historical studies to ascend to higher levels of thought under technological incentives.
Research on historical text generation can fully demonstrate scholars’ fundamental skills and critical thinking. At present, generative AI products not only heavily depend on their training corpus size but also show weakness in distinguishing authentic historical narratives from imitation texts, nearly indiscriminately regarding texts in databases as “genuine.” However, breakthrough progress has been made by researchers in fields such as medieval history and the history of the Liao and Jin dynasties, utilizing limited historical records through paths like historical writing and source analysis, refreshing their understanding of the formation processes of historical records. Scholars no longer rashly assert the authenticity of historical materials or merely focus on their authenticity, and can instead concentrate on analyzing the complex factors behind textual formation.
As technology advances, it will reinforce the importance of setting research topics. While databases integrated with generative AI enhance initiative, the genesis and assumptions of research must still be initiated by researchers. Since new technology can liberate humans from basic work, the rapid growth of computing power also significantly reduces the time cost of theoretical trial and error. The value of scholars will increasingly be linked to launching studies of high academic significance in more meticulous, comprehensive, and logically consistent ways.
Digital humanities will become a robust growth point in the interdisciplinary development of historical research, potentially leading to breakthroughs in the presentation and utilization of historical materials. It is worth noting that historical databases will not be upgraded simply by integrating generative AI. Due to the enormous differences between historical texts and contemporary texts, database developers must transform historical texts into materials suitable for AI analysis, necessitating the introduction of advanced digital humanities tools.
Significant progress has been made in the development of such tools, such as the “Shi Dian Gu Ji” platform developed in collaboration between ByteDance and Peking University, which uses optical character recognition (OCR) and automatic algorithms to convert ancient book images into text and punctuate automatically. Professor Melissa Dell’s team at Harvard University focuses on extracting complex, irregularly formatted historical texts, a process which is likely to have extensive applications in automatically identifying archives and folk documents. The Center for Open Data in the Humanities (CODH) in Japan utilizes machine learning to recognize character forms in Japanese ancient texts, developing smartphone applications with character recognition capabilities. It is clear that the value of employing similar technologies to handle historical texts extends beyond database construction.
Technical risks and challenges
The ethics of introducing AI into academic research is well-discussed in academic circles. However, solutions cannot solely rely on efforts within academia but demand a multifaceted approach involving legal development, government regulation, and collaboration among market entities. In contrast, the primary concern for historians regarding new technologies lies in the technical risks and challenges they may pose. After all, technical issues directly affect the use of technology, while ethical dilemmas typically arise only after its application.
Results generated by AI aren’t always objective and neutral; they are influenced by factors like data volume, training frequency, and training methods, potentially leading to misguided use if not critically analyzed. Even AI products employing the same model, but using differing datasets and training methods, can provide logically coherent yet divergent answers to the same question.
Furthermore, researchers’ biases, research paths, and subjective motivations can consciously or inadvertently impart an artificial hue to output results, leading to misleading “new historical narratives.” A recent paper by Mrinank Sharma of Oxford University and researchers from Anthropic, titled “Towards Understanding Sycophancy in Language Models,” demonstrates the prevalence of sycophancy in AI models based on human feedback, as users tend to prefer sycophantic responses.
Sharma and his colleagues employed sophisticated methods to interact with various AI models, arriving at the aforementioned conclusion. This raises another complication to generative AI: the black-box nature of information processing makes verification of AI-generated results challenging. This not only clashes with existing academic norms but also complicates the validation process for researchers.
While human verification remains useful for simple logical deductions or limited historical data compilation, it falls short when dealing with AI’s black-box processing of massive datasets. Even if authors submit code or verification processes that involve AI as supplementary materials to journal editors or publishers, reviewing such lengthy and unfamiliar digital information texts poses significant challenges for reviewers and editors in the field of history. Therefore, the extent to which generative AI’s involvement in historical research might be academically accepted remains unclear.
In conclusion, while we can opt to apply new technologies to retrieve historical materials using traditional methods, or reject their use in academic research to avoid risks and challenges, these approaches cost us an opportunity to propel the ancient discipline of history forward, robbing us of a chance to enhance dialogue and collaboration with emerging disciplines. Progress made by similar technologies in other fields has shown immense potential. Therefore, appropriately applying this technology in historical research demands deeper exploration within academia. Each major technological innovation disrupts existing relationships between humans, machines, and algorithms. However, as Norbert Wiener, the father of cybernetics, posited in his work, there will always be “the human use of human beings.”
Wang Shen is an associate research fellow from the Institute of Ancient History under the Chinese Academy of Social Sciences.
Edited by YANG XUE