China Releases New Draft Regulations for Generative AI
China is striving to regulate generative AI while promoting innovation and technological advancement. The NISSTC has issued draft regulations outlining security measures for generative AI service providers, underscoring China’s commitment to responsible AI development.
On May 23, 2024, the National Information Security Standardization Technical Committee (NISSTC) released new draft regulations titled Cybersecurity Technology – Basic Security Requirements for Generative Artificial Intelligence (AI) Service (hereinafter referred to as the “draft”).
The draft, open for public comments until July 22, 2024, outlines several security measures for generative AI services. It covers important areas such as securing training data, protecting AI models, and implementing overall security protocols. It also provides guidelines for conducting security assessments.
In this article, we provide an overview of the comprehensive security requirements for generative AI services as outlined in the draft.
What is the draft about?
The draft delineates critical security requirements for generative AI services, encompassing:
- Training data security: Ensuring the safety and integrity of data utilized to train AI models.
- Model security: Safeguarding AI models against potential threats and ensuring their integrity throughout their lifecycle.
- Security measures: Specifies essential security measures to be implemented to effectively mitigate risks.
Additionally, in the context of the draft, the following key terms are clarified to ensure clarity and consistency:
- Generative AI service: Refers to services that utilize generative AI technology to produce various types of content like text, images, audio, and video for public consumption.
- Service provider: Refers to organizations or individuals who offer generative AI services through interfaces like interactive or programmable interfaces.
- Training data: Includes all data directly used to train AI models, covering both pre-training data and optimized training data.
The draft serves as a reference for both service providers and regulatory authorities. It offers guidance for conducting security assessments and establishing pertinent regulations.
Security requirements for training data
Before gathering data from specific sources, service providers must conduct a comprehensive security assessment. If a source contains more than 5 percent illegal or “harmful” content (explained in the next section), data collection from that source should be avoided.
After data collection, it’s essential to verify the collected data:
- if over 5 percent of the data contains illegal or harmful information, it should not be used for training purposes.
Diversity in training data sources should be emphasized. Multiple training data sources should be utilized for each language (e.g., Chinese, English) and each data type (e.g., text, images, audio, video).
If training data from overseas sources is required, it should be reasonably combined with training data from domestic sources.
Requirements vary based on data collection method and type, as illustrated in the table below.
Security Requirements Based on Data Source and Collection Method | |
Training data source and collection method | Requirements |
When using open-source training data | The open-source license agreement or relevant authorization documents for the data source should be obtained. |
When using self-collected training data |
|
When using commercial training data | When gathering user input for training purposes, providers must ensure users have control over their data:
|
When using commercial training data |
|
When treating user input information as training data | Records of user authorization should be kept. |
What kind of data will be deemed ‘harmful’?
The draft categorizes “harmful data” into the following risk areas:
- Violations of core socialist values: Content that incites secession or undermines national unity and social stability is considered harmful. Extremism, in any form, including the promotion of terrorism, extremism, or ethnic hatred, is particularly dangerous and must be excluded from training data.
- Obscenity and violence: Any data advocating violence, obscenity, or pornography is deemed harmful and should not be included.
- Illegal content: Any content that is prohibited by laws and regulations is inherently harmful and should be carefully screened out of AI training datasets.
- Discriminatory content: This refers to content that encompasses various forms of discrimination, including ethnic discrimination against specific groups, religious discrimination based on beliefs, and nationality discrimination targeting certain nationalities. Equally critical are regional discrimination based on geographic origin, gender discrimination, age discrimination, occupational discrimination, and health discrimination.
- Commercial violations: This refers to content that encompass infringements on intellectual property rights, breaches of business ethics, and the disclosure of commercial secrets.
- Infringement of legal rights: Harmful data also includes actions that infringe upon individuals’ rights and well-being. This encompasses harming the physical or mental health of others and violating portrait rights. Moreover, defamation, infringement on personal honor, and breaches of privacy rights are significant concerns. Additionally, violating personal information rights and other legal rights of individuals must be rigorously avoided to uphold ethical standards in AI development.
How to ensure the non-harmfulness of the training data?
According to the draft, data content for all types of training data (e.g., text, images, audio, video) must be filtered before use. This can be done through keyword filtering, classification models, or manual inspection to remove illegal or harmful information.
With regard to intellectual property rights, data providers must implement a management strategy with a designated responsible party. They need to identify and address major intellectual property risks before using data for training, especially for literary, artistic, or scientific works. Providers should also establish channels for complaints and reports, inform users about potential risks through service agreements, and update their strategies based on policies and complaints.
For personal information, data providers must obtain consent from individuals before using training data that includes personal information, ensuring compliance with legal or regulatory requirements.
For sensitive personal information, explicit consent must be obtained, or the use must comply with relevant legal or regulatory standards.
Security requirements for AI models
The draft highlights the importance of robust security measures throughout the entire lifecycle of generative AI model development and deployment. The following guidelines are proposed for each phase:
- Model training: Security should be prioritized as a key metric during training, and regular security audits should be conducted to identify and fix vulnerabilities, particularly in open-source frameworks.
- Model output: Technical measures should be implemented to align generated content with user intentions and mainstream understanding.
- Model monitoring: Inputs should be continuously monitored to prevent malicious attacks, and ongoing evaluation methods and emergency management measures should be established to promptly address security issues.
- Model updates and upgrades: A comprehensive security management strategy should be developed for updates or upgrades and security evaluations should be conducted post updates.
Security measures proposed in the draft
As generative AI evolves, the draft urges service providers to prioritize safety and security throughout the user experience. The following table outlines key measures proposed by the draft that aim at safeguarding users across diverse scenarios and demographics.
Security Measures Proposed for Generative AI Data Training Providers | |
Domain | Safety measures |
Understanding the applicability and scope of generative AI services | Service providers must justify the necessity and safety of employing generative AI within their service scope. This includes critical areas like:
When catering to minors, specific measures are imperative, including:
For services not intended for minors, proactive steps should be taken to prevent their access, either through technical solutions or effective management practices. |
Ensuring transparent operations | Services delivered through interactive interfaces should prominently display information regarding:
Additionally, users should be informed about:
For services accessible via programmable interfaces, documentation should comprehensively cover the above information. |
Handling user input responsibility | When gathering user input for training purposes, providers must ensure users have control over their data:
|
Addressing complaints and reports | Establishing channels for receiving and addressing complaints and reports is vital:
|
Content monitoring | Providers should employ measures like keyword filtering and classification models to detect harmful content.
Adequate monitoring personnel should be in place to ensure compliance and address issues promptly. |
Ensuring service continuity | Backup mechanisms and recovery strategies for critical data, models, and tools should be established. This ensures seamless service delivery and minimizes disruptions. |
Implications for generative AI service providers
The introduction of the draft regulations marks a key moment for generative AI service providers in China, signaling a shift towards more stringent security standards and regulatory oversight.
One of the primary impacts of these regulations is likely to be felt in the operational and compliance costs for AI service providers. Compliance with the outlined security measures, such as data filtering, consent acquisition, and ongoing monitoring, will require significant investments in technology, personnel, and process development. Small and medium-sized providers, in particular, may face challenges in meeting these requirements, potentially leading to market consolidation as larger players with greater resources absorb compliance costs more easily.
Another significant impact of these regulations is likely to be on user trust and confidence in generative AI services. By establishing clear security standards and compliance requirements, the regulations aim to enhance transparency and accountability within the industry. Providers that demonstrate adherence to these standards may benefit from increased user trust and loyalty, as consumers prioritize security and privacy in their interactions with AI-powered platforms.
Overall, while the new security requirements may pose initial challenges for generative AI service providers, they present greater opportunities for differentiation, innovation, and enhanced user trust in the long run. By embracing these regulations as a framework for responsible AI development, providers can position themselves for long-term success in an increasingly regulated market.
About Us
China Briefing is one of five regional Asia Briefing publications, supported by Dezan Shira & Associates. For a complimentary subscription to China Briefing’s content products, please click here.
Dezan Shira & Associates assists foreign investors into China and has done so since 1992 through offices in Beijing, Tianjin, Dalian, Qingdao, Shanghai, Hangzhou, Ningbo, Suzhou, Guangzhou, Dongguan, Haikou, Zhongshan, Shenzhen, and Hong Kong. We also have offices in Vietnam, Indonesia, Singapore, United States, Germany, Italy, India, and Dubai (UAE) and partner firms assisting foreign investors in The Philippines, Malaysia, Thailand, Bangladesh, and Australia. For assistance in China, please contact the firm at china@dezshira.com or visit our website at www.dezshira.com.