It’s Time to Create a National Registry for Large AI Models

{
  "authors": [
    "Gillian Hadfield",
    "Mariano-Florentino (Tino) Cuéllar",
    "Tim O’Reilly"
  ],
  "type": "commentary",
  "centerAffiliationAll": "dc",
  "centers": [
    "Carnegie Endowment for International Peace"
  ],
  "collections": [
    "Artificial Intelligence"
  ],
  "englishNewsletterAll": "ctw",
  "nonEnglishNewsletterAll": "",
  "primaryCenter": "Carnegie Endowment for International Peace",
  "programAffiliation": "TIA",
  "programs": [
    "Technology and International Affairs"
  ],
  "projects": [],
  "regions": [
    "North America",
    "United States",
    "Iran"
  ],
  "topics": [
    "Technology",
    "AI"
  ]
}

Commentary

It’s Time to Create a National Registry for Large AI Models

The public lacks sufficient visibility to know who might be responsible for the benefits and risks that generative AI will bring. The first step to understanding these models should be gathering basic information through a simple process of registration.

By Gillian Hadfield, Mariano-Florentino (Tino) Cuéllar, Tim O’Reilly

Published on Jul 12, 2023

People around the world understandably greeted the seemingly sudden arrival of generative AI models like ChatGPT with a mix of enormous interest and more than a little confusion. Reacting to the remarkable new capabilities of these models, the White House convened an urgent meeting with the tech company CEOs developing the technology while the U.S. Congress debated potential options and G7 countries scrambled to consult about next steps.

Some observers warn about the enormous power of these models and the catastrophic or even existential risk they may pose. Others say this is Silicon Valley hype distracting from real problems like inequality and conflict. More tempered responses train attention on how this technology can help solve those very problems while also voicing concern about how generative AI could empower oppressive regimes or destabilize society. What if current or future models are used to produce dangerous biological or chemical weapons, for example, or they become widely available to North Korea or Russia?

These questions—and just about any other serious queries about the future of AI—make one thing perfectly clear: the public and government leaders lack sufficient visibility to know how to judge this moment in history, and who might be responsible for the benefits and risks that generative AI will bring. What policymakers know about existing generative AI models is entirely a function of what the relevant companies have chosen to disclose. Only the companies building this technology know what they are building or what safety tests they are performing, and even they can only guess about what their peers have. The public lacks a way to know who is building even more powerful models, and to whom those models may be made available.

That’s why the first step countries should take is to gather basic information about who is training the most sophisticated large generative models—a goal that can be accomplished through a simple process of registration that would furnish governments with basic insights into who is developing models and whether there is substantial risk that their use might violate export control limits or other laws.

Registration is a familiar feature of modern legal systems. Corporations are subject to registration so people and other businesses can have confidence that a company with which they are transacting is not a sham or a front for illegal activity. Broker-dealers of securities register with the government. So do companies handling nuclear materials for civilian purposes, or labs handling dangerous pathogens or toxins.

The proposed registration system should be straightforward and make it easy for responsible companies to achieve compliance. It would be tailored to protect intellectual property while enabling countries to forge a better understanding of how the technological frontier is moving.

Registration allows governments to ensure that markets and innovation are driven by people and businesses who are following the rules. It draws a line between the more scrupulous actors—the “good guys” willing to comply with a targeted, narrowly tailored rule—and those who might be less inclined to comply and thus merit more careful scrutiny.

Here’s how this might work in practice.

First, governments should establish national registries for large generative AI models over a threshold defined by size (number of parameters or amount of compute used for training, for example) and capabilities. Given the dramatic shift in capabilities demonstrated by OpenAI’s GPT-4, the threshold should be set near and slightly above the capabilities of this model. In the United States, the registry could initially be operated by a secure office within an agency that has relevant related responsibilities, such as the Department of Commerce (which handles technology standards as well as export controls), the Department of Energy (which monitors AI safety and advises on technology safety issues relevant to federal procurement), or the Department of Homeland Security (responsible for critical infrastructure). Existing federal laws governing export controls, sensitive information, and related matters may allow initial progress toward a prudent registration scheme even in the absence of new legislation.

Second, developers should be legally required to participate in this registry, and to disclose confidentially to the registry descriptions of the size, training methods, training data, and known capabilities of these models. The models (containing billions or trillions of parameters) and the data files used for training the models wouldn’t be transferred to the registry. Inadequate or deceptive disclosures should bring substantial penalties including, at a minimum, de-registration. The registry should be highly secure to protect against adversarial efforts to hack into the information shared by developers.

Third, governments should make it unlawful to deploy or use the services of an unregistered model. While only developers of large models need to register, this obligation not to use unregistered models is aimed both at the developers and the entities—corporate, individual, or government—that purchase the services of models. If the registration requirement applies, for example, to the next iteration of GPT-4, then it would be illegal to use that model unless it is registered. Developers can then be asked for evidence of registration by their users, customers, or service providers.

A registry would allow governments to understand who is working on and responsible for the use of these models. Civil servants would be better able to enforce prudent export control limits, restrictions on the development of biological weapons, or other existing laws. Public officials would also build a nucleus of capacity and expertise to informing further policy on generative AI and its close cousins, particularly as systems get closer to acquiring the capacity to improve themselves.

In the shorter term, registration for large generative AI models will bring an immediate benefit: helping to sort scrupulous actors from those with troubling motivations in this space. Most if not all compliance will occur among people or entities that mean no harm to society. By leaving plenty of room for innovation but establishing at least a minimal requirement to disclose basic information, a registration requirement can help reveal the companies and individuals motivated to evade such threshold requirements that may therefore merit further scrutiny from law enforcement.

This proposal is critically different from the current draft of the EU’s AI Act, which would require foundation models like those underlying ChatGPT to register details of the model in a public database. While sharing certain information with the public is an important goal that may merit new policies in the future, it is not unreasonable for some commercial details to be protected from disclosure to encourage innovation. Under the proposal here, required disclosures would only be made to governments, under duties of confidentiality that would protect trade secrets as appropriate while sharing other information with the public.

Also unlike the EU’s AI Act, the proposed registration would not necessarily entail any other substantive requirements at this point. Existing laws already prohibit many of the most troubling activities that these models could facilitate. Registration will create public entities with eyes on (at least good faith) global developments and will form a nucleus of public expertise about model capabilities. Building the infrastructure to administer this policy will generate useful knowledge about how best to target the problematic uses of AI while encouraging its continuing refinement.

And where policymakers and the public see a need for more fully developed licensing regimes that impose specific requirements on some categories of economic actors or activities—such as models that may be used to manage the power grid—registration is an important foundational step. Modern economies already depend on many such licensing requirements: lawyers must be licensed to provide legal advice, insurance companies to sell insurance, hospitals to admit patients, brokers to buy and sell securities on stock exchanges.

In fact, large language models and similar iterations of generative AI will likely become, in the language of financial regulation, systemically important: they are capable of rewriting the foundations of economic and social interactions. Developing an AI registry will allow governments to act if and when systemic effects are discovered. It’s possible that disclosure and auditing will be enough. Or new regulatory technologies and approaches may become necessary. The challenge is that it remains unclear what these models can do or enable people to do. They have emergent capabilities: capabilities that are more than the sum of their parts, capabilities that depend on tremendously complex and impossible to predict interactions between the models and all the people and organizations that are interacting with them.

Yet today, the only people who have full information about the scale, training methods, training data, and capabilities of large language models are those inside the technology companies that are building them. Even though these companies are by and large mindful of and careful about the risks, it is not democratically legitimate for this visibility to be exclusively within their purview. Decisions about hugely consequential technologies—how fast they roll out, how much they disrupt economies and societies, what is considered a good tradeoff between benefit and harm, what kinds of tests should be required prior to deployment—should not be solely under corporate governance, under the exclusive control of even well-intentioned business executives who are legally obligated to act only in the interests of their shareholders. Precisely because society will benefit from further innovation and development of large language models and similar technologies, regulation should start with basic registration schemes to enable visibility into the development of these technologies and to ensure that prudently designed policies can be carefully targeted.

The wise and legitimate governance of AI begins with registration. Countries should move in that direction immediately.

About the Authors

Gillian Hadfield is the Schwartz Reisman Chair in Technology and Society at the University of Toronto, a CIFAR AI Chair at the Vector Institute for Artificial Intelligence, and author of Rules for a Flat World: Why Humans Invented Law and How to Reinvent It for a Complex Global Economy; she has been an independent policy adviser to OpenAI since 2018.

Mariano‑Florentino (Tino) Cuéllar is the president of the Carnegie Endowment for International Peace, a former justice of the California Supreme Court. He served on the National Academy of Sciences Committee on Ethics and Governance of Computing Research and its Applications, and is coauthor of Government by Algorithm.

Tim O’Reilly is founder, CEO, and chairman of O’Reilly Media and a visiting professor of practice at University College London's Institute for Innovation and Public Purpose. He is the author most recently of WTF? What’s The Future and Why It’s Up To Us.

About the Authors

Gillian Hadfield

Mariano-Florentino (Tino) Cuéllar

Distinguished Fellow

Mariano-Florentino (Tino) Cuéllar is a distinguished fellow at the Carnegie Endowment for International Peace. He was previously the tenth president of the Carnegie Endowment for International Peace. A former justice of the Supreme Court of California, he has served three U.S. presidential administrations at the White House and in federal agencies, and was the Stanley Morrison Professor at Stanford University, where he held appointments in law, political science, and international affairs and led the university’s Freeman Spogli Institute for International Studies.

Tim O’Reilly

Technology AI North America United States Iran

Carnegie does not take institutional positions on public policy issues; the views represented herein are those of the author(s) and do not necessarily reflect the views of Carnegie, its staff, or its trustees.

More Work from Carnegie Endowment for International Peace

Article
The Board of Peace Plan for Gaza
The Trump-led board has laid out blueprints for “voluntary emigration” and land confiscation.
Zaha Hassan
Paper
Assessing Information Ecosystems: How Governments Can Get Ahead of Hybrid Threats
The hybrid warfare landscape is evolving rapidly, leaving policymakers without clear strategies. To better inform their work in addressing emerging challenges, governments must dig deeper into the underlying dynamics at play.
Raluca Csernatoni, Alicia Wanless
Commentary
In the War with Iran, Targeting Energy Infrastructure Has Outsized Impact on Civilians
Acknowledging these harms strengthens the case for ethical limits on the weaponization of critical infrastructure.
Alia Sajjadian, Andrew Leber
Commentary
Affordability Is the Top Issue on California’s November Ballot
Direct democracy is the political wildcard in the state’s midterm election.
Mark Baldassare, Ian Klaus
Commentary
Carnegie Politika
Russia Is Not Even a Contender in the Global AI Race
When political power is concentrated in the hands of a small circle of people, a country invariably ends up with technological stagnation.
Aleksei Kiselev

More Work from Carnegie Endowment for International Peace

Photo of refugee tents in Gaza housing displaced Palestinians.

Article

The Board of Peace Plan for Gaza

The Trump-led board has laid out blueprints for “voluntary emigration” and land confiscation.

Zaha Hassan

Paper

Assessing Information Ecosystems: How Governments Can Get Ahead of Hybrid Threats

The hybrid warfare landscape is evolving rapidly, leaving policymakers without clear strategies. To better inform their work in addressing emerging challenges, governments must dig deeper into the underlying dynamics at play.

Raluca Csernatoni, Alicia Wanless

Two women walking in front of thick black smoke

Commentary

In the War with Iran, Targeting Energy Infrastructure Has Outsized Impact on Civilians

Acknowledging these harms strengthens the case for ethical limits on the weaponization of critical infrastructure.

Alia Sajjadian, Andrew Leber

Commentary

Affordability Is the Top Issue on California’s November Ballot

Direct democracy is the political wildcard in the state’s midterm election.

Mark Baldassare, Ian Klaus

Commentary

Carnegie Politika

Russia Is Not Even a Contender in the Global AI Race

When political power is concentrated in the hands of a small circle of people, a country invariably ends up with technological stagnation.

Aleksei Kiselev