Human-AI alignment in context of large-scale AI models
Human-AI alignment plays an important role in solving the security and trust issues of large-scale AI models. Image generated by AI
As large-scale artificial intelligence (AI) models grow ever more powerful, human-AI alignment—the alignment of AI behaviors and goals with human values, preferences, and intentions—becomes increasingly important. At present, human-AI alignment marks a significant direction in the field of AI. The accelerated development of large-scale AI models has triggered a debate over the development concepts of effective accelerationism (e/acc) or effective alignment (e/a). The development and application of AI technology carries more responsibilities, and the technological evolution and governance exploration of human-AI alignment (including value alignment) will promote responsible AI innovation, enabling humanity and AI to move towards a bright future of harmonious coexistence and effective collaboration.
Significance and necessities
With the accelerated development of large-scale AI models in recent years, AI security risks and control issues have attracted global attention. Emerging technologies represented by large-scale AI models are constantly pushing AI to new frontiers. However, to a certain extent, this has also generated concerns over the extreme risks that AI may pose in the future. In addition to concerns over ethical issues such as the leakage of important data and privacy, algorithmic discrimination and opacity, and disinformation, whether the ever more powerful and widely used AI models may generate catastrophic or extreme risks has also received more attention. Specifically, unlike any previous technology, current and future AI technologies will bring three major new risks and challenges to individuals and society at large.
The first is the risk associated with decision-making transfer. In the dimension of economic and social activities, AI and robots will assist or even replace humans in making decisions in more and more human affairs. This decision-making transfer will bring new risks, such as technological unemployment and AI security. It is therefore necessary to consider whether some decisions and human affairs should be outsourced to AI.
The second is the risk of emotional substitution. In the dimension of interpersonal/human-machine relationships, AI and robots have and will continue to deeply intervene in human emotions. While offering emotional companionship, AI and robots may also affect interpersonal communication and generate risks of emotional substitution, weakening or even replacing real connections between people. An essential principle in setting the ethical boundaries of this new human-machine relationship is that human-machine interactions should actively foster human connection and social solidarity, as genuine human connections will be particularly precious in the era of intelligence.
The third is the risk associated with human enhancement. In the dimension of human development, technologies such as AI and brain-computer interfaces may push human society into a so-called “post-human era.” Such technologies may be used to enhance and augment humans themselves. Following the deep integration of humans and machines in the future, the human body, brain, and intelligence could be fundamentally transformed by AI. This raises questions about what humanity will become and whether such enhancements will lead to new forms of human inequality.
Beyond these considerations, there are also risks associated with the misuse or abuse of technology (such as malicious applications of deepfake technology), the environmental and sustainability challenges posed by AI’s high energy consumption, and catastrophic risks to human survival if AI escapes our control. These issues also further exacerbate the debate between advocates of e/acc and those of e/a. Therefore, responsible innovation in the field of AI has become increasingly important and necessary.
In this context, with the growing capacity and wider application of large-scale AI models, aligning their behaviors and objectives with human values, preferences, ethics, intentions, and goals has become a central focus in their development. A new concept in the field of AI security and ethics, human-AI alignment mainly aims to create large-scale AI models that act as safe, sincere, useful, and harmless intelligent assistants, while avoiding potential negative effects or risks in interactions with people, such as the output of harmful content, AI hallucinations, and AI discrimination.
In short, the meaning of human-AI alignment is twofold. First, it refers to aligning AI with humanity, primarily involving the creation of safe and ethical AI systems. Second, it encompasses aligning humans with AI, with a core task of ensuring responsible use and deployment of AI systems. In the context of large-scale AI models, human-AI alignment is crucial to ensuring safety and trust in the process of human-AI interaction. The reason current large-scale AI models and applications such as chatbots can easily handle a wide range of user prompts with minimal negative impact is largely due to advances in alignment technologies and governance practices. It suffices to say that human-AI alignment is fundamental and paramount for the availability and security of large-scale AI models.
Pathways
In practice, the AI industry currently regards human-AI alignment as an important approach for the security and governance of large-scale AI models, achieving considerable technological advancements that largely ensure security and trust during the development, deployment, and use of these models. Human-AI alignment is a key link in the development and training process of large-scale AI models, with two primary alignment methods currently in use. One is a bottom-up approach that facilitates reinforcement learning according to human feedback, which requires the AI model to be fine-tuned with a value-aligned dataset, while the human trainer scores the model’s output. This trains the model on human values and preferences through reinforcement learning. Technically, this method includes several steps: initial model training, human feedback collection, reinforcement learning, and iteration. The other is a top-down approach—the alignment method of developing principled AI, the core of which is to input a set of ethical principles into the model, and use technical methods to let the model judge or score its own output so that its output conforms to these principles. For example, OpenAI has adopted the alignment method of reinforcement learning through human feedback, while Anthropic has adopted principled AI. Despite their different paths, these human-AI alignment approaches share the common goal of refining large-scale AI models into safe, sincere, useful, and harmless intelligent assistants.
In addition, the AI industry is also exploring a diverse range of security and governance measures, including adversarial testing (such as red teaming), model security assessment, explainable AI, ethical reviews, and third-party services, all aimed at ensuring responsible innovation in the field of AI. Notably, some AI companies are exploring special security mechanisms for both AI models that may pose catastrophic risks and for superintelligent AIs that may emerge in the future, such as OpenAI’s “preparatory” team and Anthropic’s Responsible Scaling Policy. The core theory behind these initiatives is to conduct systematic evaluations of newly developed and more advanced AI models, releasing them only when their risks fall below a certain threshold; otherwise, the launch will be postponed until safety concerns are alleviated. Through their relevant explorations and initiatives in human-AI alignment, AI companies can facilitate the competitiveness of their products in the market. Meanwhile, regarding human-AI alignment as a core element in ensuring the safety and benefit of ever more powerful AI models in the future, these companies have actively carried out cutting-edge explorations.
It can be argued that the concepts and practices of human-AI alignment, including AI value alignment, form the necessary pathway for the development and application of large-scale AI models, which can help solve many problems related to their commercial deployment. Through the concepts and practices of human-AI alignment, AI developers can build safer, as well as more useful, reliable, and ethical AI systems. In the foreseeable future, large-scale AI models will assist or even replace humans in more scenarios, and human-AI alignment will be the necessary pathway for both current and future large-scale AI models, as well as artificial general intelligence that might emerge in the future. This alignment encompasses not only trust and control, but, more importantly, the safe development of AI in the future. Human-AI alignment is essential for effectively managing the risks associated with the emergence of more powerful AI models.
In summary, given the important role that human-AI alignment plays in solving the security and trust issues of large-scale AI models—specifically in achieving a balance between security and innovation—the field of AI requires relevant policies that actively support and encourage the exploration of technical means and management measures for human-AI alignment in these models. Policy guidelines, industry standards, and technical specifications need to be formulated to ensure the sound development of AI.
Cao Jianfeng is a senior research fellow at Tencent Research Institute.
Edited by REN GUANHONG