GenAI governance: A case study of ChatGPT-CSST

GenAI governance: A case study of ChatGPT

By MA YUNAN / 05-23-2024 / Chinese Social Sciences Today

The most prominent risks associated with GenAI arise from data compliance, algorithmic biases, and intellectual property. Photo: TUCHONG

Generative AI (hereinafter referred to as “GenAI”), represented by ChatGPT, has undoubtedly brought about numerous social benefits while also carrying significant risks. Therefore, the urgent task is to clarify the relationship between GenAI’s application value and potential risks based on its development status in China, so as to effectively mitigate risks without affecting its applications.

The operational mechanism of GenAI mainly consists of three stages: the preparatory stage involving machine learning and manual labeling, the computing stage where algorithms process data to produce results, and the generation stage where products from data computing are output to society and have an impact. Currently, the most prominent risks associated with GenAI include data compliance risks during preparation, algorithmic biases during computation, and intellectual property risks during generation.

Data compliance

To start with, in the preparation stage, we encounter data compliance risks. China’s current data compliance system is based on the “Cybersecurity Law,” the “Data Security Law,” and the “Personal Information Protection Law,” which mandate that data processors must implement necessary measures throughout the processing process to ensure basic data security, cybersecurity, and personal information security. Within China’s legal framework, the data compliance risks associated with GenAI predominantly emerge in three categories: data source compliance risks, data usage compliance risks, and data accuracy risks.

To begin with, data source compliance risk emerges because GenAI often requires a large amount of training data in its initial stages. This may lead to issues such as whether users consent to the collection of personal information, whether collecting and using publicly available information falls within reasonable limits, and whether the collected samples are protected by copyright and can be considered “fair use” during training.

Next, data usage compliance poses two significant risks. There is the risk of data leakage, where users may inadvertently transmit personal information, business data, or even proprietary secrets to ChatGPT. Analyzing ChatGPT’s operational mechanism reveals its reliance on user inputs and interaction data during iterative training, heightening the challenge of data security. Meanwhile, it is more difficult for users to exercise their right to delete personal information. Although OpenAI’s privacy policy stipulates that users have related rights to their personal information, the complexity of requesting GenAI systems to delete data introduces uncertainty regarding developers’ ability to comply with regulatory requirements.

The risk of data accuracy also arises in the early stages of ChatGPT training, when the content included in the data is obtained and selected by developers from the internet. Consequently, there exists a possibility that the generated content may be inaccurate due to data omissions or errors.

Algorithmic biases

The combination of “machine learning” with “manual labeling” enhances the intelligence and accuracy of GenAI. However, it also dramatically increases the probability of algorithmic bias. This combined method reflects human subjective judgments and preferences more than traditional machine learning methods because people incorporate their preferred information into the machine learning model. This also renders associated biases challenging to track and prevent. An analysis of ChatGPT’s operational methods reveals that algorithmic bias primarily manifests in two aspects: first, since the received data requires manual labeling, there is a certain degree of error in the understanding process. Second, when ChatGPT processes data to reach conclusions, there is a need to correct the original results if they do not align with public expectations. This correction process can also introduce a certain degree of algorithmic bias.

Intellectual property

The rise of GenAI poses new challenges to many industries, particularly impacting intellectual property dynamics during the generation stage. GenAI’s remarkable intelligence fundamentally alters the landscape of intellectual property ownership compared to previous AI systems. ChatGPT, as a GenAI system, is far superior to analytical AI in data processing and analysis. Its content generation process encompasses automated content compilation, intelligent editing and processing, multimodal transformation, and creative generation, directly influencing the production and distribution models of published content. While ChatGPT incorporates some natural human creative elements from its creators, enhancing its alignment with the criteria for constituting a work to some extent, debates persist regarding the attribution of rights to works generated by GenAI. Furthermore, the specific criteria for identifying granted rights remain underdeveloped. Therefore, intellectual property risk emerges as the third major risk inherent to GenAI.

Countermeasures

To address the three key risks associated with GenAI, the following countermeasures are recommended.

First, we should strengthen data compliance mechanisms within GenAI enterprises. The development of GenAI should prioritize not only capability and efficiency but also safety. Relevant enterprises should implement a robust data compliance system to ensure data security. Strengthening data compliance within enterprises can be achieved through three key measures. First, we should establish data compliance principles. These principles mainly include four points: the principle of legality and compliance, the principle of informed consent, the principle of legitimate purpose, and the principle of minimum necessity.

Second, we must establish a diverse technical mechanism for data compliance. At the macro level, industry standards should be unified. Authorities in each industry should take the lead in establishing a data equivalent of the “Xinhua Dictionary,” ensuring consistent data encoding and formats, and that the source, content, and processing logic of data can be “counterfactually verified.”

At the meso level, we should consider establishing both internal and external review systems. Internally, a specialized data compliance institution should be set up to handle daily data compliance within enterprises. Externally, a third-party review mechanism should be introduced to audit and ethically review corporate data compliance. Lastly, at the micro level, ethical standards need to be established. These standards and principles should be embedded into the behavioral logic of technology applications in a legally enforceable manner to ensure they adapt appropriately to changing circumstances.

Third, data compliance laws need to be improved. Legislation should be enhanced by swiftly introducing fundamental laws related to data and AI to provide top-level guidance for corporate data compliance. Next, law enforcement should be strengthened by clearly defining the enforcement authority of various departments to avoid fragmented governance. Meanwhile, the judiciary should be improved by refining the electronic evidence system to protect relevant rights of stakeholders.

We must also integrate technology and management to address algorithmic bias in GenAI. This involves two primary measures. The inherent algorithmic biases that arise during the machine learning process of GenAI can be addressed by adjusting relevant models to adhere to norms and technical standards and putting them through substantive review before being marketed. Given the characteristics of GenAI, correction efforts can be divided into two aspects: using algorithm programming to prevent potential inherent biases in machine learning, and establishing standards for manual labeling while enhancing the professional skills of practitioners to tackle biases in manual labeling.

Meanwhile, we need to address the acquired algorithmic biases that arise from the self-learning of GenAI. This requires the establishment of an agile, automated, end-to-end regulatory system to eliminate biases. The first step is to implement automated supervision of algorithmic technology, pausing output whenever an algorithmic bias is detected and tracing the source of the problem. Next, a multi-stakeholder regulatory model involving administrative bodies, platforms, industry associations, and enterprises should be established. Lastly, we should enforce an agile end-to-end regulatory mechanism to oversee the entire process of GenAI output, significantly reducing the probability of erroneous conclusions caused by algorithmic biases and effectively promoting the construction of a trustworthy algorithmic system.

Last but not least, adopting a limited protection model to mitigate intellectual property risks associated with GenAI creations is essential. Unlike traditional AI technologies, GenAI’s innovation lies in its degree of self-awareness and its involvement in processing and creating output. If all results from this self-awareness were protected, it could lead to a scenario where GenAI companies hold a “monopoly on creation.” However, from a commercial perspective, it would be unfair not to protect the “works” derived from GenAI, considering the substantial financial and technological investments made to develop highly intelligent AI programs.

Hence, the intellectual property rights of ChatGPT-generated content should currently be evaluated based on the model’s technical operation mode, degree of involvement, and level of innovation, and a differentiated, limited protection model should be adopted for its output. As GenAI evolves and we gain a deeper understanding of its mechanisms, a more precise intellectual property protection model can be established.

GenAI, exemplified by ChatGPT, is in its early stages and brings various legal risks that should be addressed within the existing legal framework. It is crucial not to stifle the development of GenAI due to industry risks and theoretical controversies. A combination of “law and technology” in governance is needed to create a favorable market environment, ensuring the robust growth of the GenAI market.

Ma Yunan is from the School of Economic Law at Southwest University of Political Science and Law.

Edited by WENG RONG

PREV : Reflections on legal studies of autonomous driving

NEXT : Improving disciplinary system for Chinese criminal law studies