Source: Getty
commentary

What China’s Algorithm Registry Reveals about AI Governance

The registry's user manual reveals China's approach to regulating the latest tech front: the algorithms that power the biggest internet companies.

Published on December 9, 2022

For the past year, the Chinese government has been conducting some of the earliest experiments in building regulatory tools to govern artificial intelligence (AI). In that process, China is trying to tackle a problem that will soon face governments around the world: Can regulators gain meaningful insight into the functioning of algorithms, and ensure they perform within acceptable bounds?

One particular tool deserves attention both for its impact within China, and for the lessons technologists and policymakers in other countries can draw from it: a mandatory registration system created by China’s internet regulator for recommendation algorithms.

Although the full details of the registry are not public, by digging into its online instruction manual, we can reveal new insights into China’s emerging regulatory architecture for algorithms.

The Registry’s Founding

The algorithm registry was created by China’s 2022 regulation on recommendation algorithms (English translation), which came into effect in March of this year and was led by the Cyberspace Administration of China (CAC). China’s algorithm regulation has largely focused on the role recommendation algorithms play in disseminating information, requiring providers to ensure that they don’t “endanger national security or the social public interest” and to “give an explanation” when they harm the legitimate interests of users. Other provisions sought to address monopolistic behavior by platforms and hot-button social issues, such as the role that dispatching algorithms play in creating dangerous labor conditions for Chinese delivery drivers.

The regulation also requires recommendation algorithms with “public opinion characteristics” and “social mobilization capabilities” to complete a filing with the mysteriously named Internet Information Service Algorithm Filing System. But the provisions didn’t elaborate on the specifics, and the filing requirement went largely unremarked upon at the time.

Analysts got a first look at those filings in August 2022, when the CAC released the first batch of thirty algorithm registrations. Accessible via the registry’s web page, the filings included algorithms from some of China’s biggest internet platform companies, including Tencent, Alibaba, and Bytedance. These publicly available filings usually consisted of a single page with six different short-response categories, including “Algorithm Fundamentals” and “Algorithm Operating Mechanism.”

But the actual descriptions in these filings were pitched at such a high level as to be almost completely devoid of meaningful detail. For example, the filing for Weibo’s “hot search” feature describes the algorithm as adding together “search popularity, discussion popularity, and dissemination popularity,” multiplied by an “interaction rate coefficient.” That may be an accurate description, but it is also so high level that an observer with no knowledge of this specific algorithm could essentially guess it. If this were the full extent of information given to Chinese regulators, it would provide them with no meaningful insights into the algorithms, how they were trained, or how they might perform.

Insights from the User Manual

But a closer look at the algorithm registry’s landing page, which contains a downloadable user manual for entities registering their algorithms, provides a wider window into what information the CAC is actually gathering. A close read of that manual and examination of the screenshots within it show that only a portion of the information filed in the registry has been revealed.

The most detailed requirements for disclosure in the manual come via a screenshot showing the page for disclosing “Detailed Algorithm Attribute Information.” Here it asks that algorithm providers list the name of each open-source and self-built data set that was used to train the model, as well as the specific source of that data. In addition, it requires the provider to state whether algorithm inputs involve biometric or other personal information.

Figure 1. Annotated screenshot of user manual for algorithm registry.

In a separate screenshot for a page titled “Algorithm Basic Properties Information,” companies are required to upload PDF files for their “Algorithm Security Self-Assessment.” As these uploads are unavailable to the public, we do not know exactly what information is required in it or how security is defined. Given this regulation’s emphasis on controlling the dissemination of information, the security self-assessment could involve control over public expression, which would square with the types of algorithms that have been registered, including “security risk identification” algorithms that “dispose of illegal information and content.” But it could also encompass other issues, such as misuse of generative AI models and adversarial attacks.

Figure 2: Annotated screenshot of user manual for algorithm registry.

Other screenshots preview the titles of further sections that likely require more information—“Algorithm Strategy,” “Algorithm Risk and Prevention Mechanism”—but the user manual does not provide screenshots of those pages, so the required disclosures remain a mystery.

By exploring the user manual, we see that what the registry requires from Chinese companies is both more and less than previously understood. More, because the manual reveals significant new disclosure requirements that do not show up in the public versions of the filings. The requirement to enumerate data sets is self-explanatory, while the algorithm security self-assessments could be anything from cursory to comprehensive. Less, because some had taken the registry filing requirements to mean that the Chinese government could now gain direct access to the algorithms or the underlying code. This does not appear to be the case, and further reporting supports that conclusion.

Comparisons Outside China

The most direct regulatory parallel is found in the European Union’s Digital Services Act (DSA), a brand-new law that requires greater transparency and audits of recommendation algorithms—though we have yet to see what forms algorithmic transparency takes under it. When those are made public, they will constitute another experiment in algorithmic transparency worth following.

One other possible analogue for China’s experiment is the movement to promote model cards among the AI ethics community. Originally introduced in a paper by researchers at Google and the University of Toronto, model cards were pitched as “a step toward the responsible democratization of machine learning.” Most model cards provide simple overviews of a model’s ideal form of input, visualize some potential limitations, and present basic performance metrics to reflect real-world impacts. Like the Chinese registry’s filings, many model cards provide information on architecture, application scenarios, training data, and use of sensitive data.

However, model cards tend to emphasize performance assessment, while China’s algorithm registry emphasizes security assessment. In some cases, emphases may overlap; model cards might have a security component, and some filings focus on mitigating negative impacts on marginalized groups, such as migrants working in food delivery. But generally, model cards address concerns regarding algorithmic bias by comparing a model’s performance when used across different demographic groups. They also address both expert and non-expert audiences. By contrast, China’s algorithm registry targets the government and the public with different filings, and Chinese citizens are not invited to evaluate bias. Rather, that remains the purview of the government, which can define what constitutes security and risk to it. The current registry reflects and reinforces the Chinese Communist Party’s role as the ultimate arbiter of these questions.

Lurking behind all of these experiments is a deeper question: Is it even technically possible to gain meaningful transparency into how complex algorithms function? Deep learning is often described as a “black box” technology, because the predictions, decisions, and recommendations it makes are not easily explained, even by its developers. Algorithmic interpretability is an active area of machine learning research, but major questions remain as to when—if ever—it will reach a level that is useful in a regulatory context.

The CAC might also lack the in-house technical expertise to understand information companies are submitting to the algorithm registry. One report  described a meeting between representatives from Bytedance and the CAC, in which Bytedance employees “had to rely on a mix of metaphors and simplified language” to communicate with the officials.

China’s Next Steps

That may not sound like an auspicious beginning to such an ambitious project. But when building out new regulatory architecture, you have to start somewhere. Prior to the algorithm registry, Chinese officials didn’t know what they didn’t know. By providing a skeletal understanding of influential algorithms in the country, the registry can alert the CAC to where it might lack useful information and provides scaffolding for it to make further demands on companies for disclosure.

In that sense, the current push to build tools to regulate algorithms echoes China’s early attempts to control internet access and freewheeling social media platforms. In both cases, the government’s efforts appeared ham-fisted and were often described as futile. But by slowly building its understanding of the technology and the industry, the government was able to tighten screws until it reached an equilibrium that served its ends. Now we will see if it can pull off the same trick with algorithms.

Carnegie does not take institutional positions on public policy issues; the views represented herein are those of the author(s) and do not necessarily reflect the views of Carnegie, its staff, or its trustees.