Artificial intelligence (AI) capabilities are advancing rapidly, and there are serious concerns that AI could reach a point where it poses risks to international security.1 At the same time, AI risk management is “still in its infancy.”2 This creates a dilemma for policymakers. Key AI risks are poorly understood and speculative, and premature regulation—or even premature pressure to follow voluntary safety standards—could be ill-conceived and obstruct progress. But moving too slowly could mean tolerating high levels of risk.
A partial solution to this dilemma is to invest heavily in research on the risks of AI and how to mitigate them, with the goal of achieving a mature understanding of these topics as quickly as possible. However, given the challenges of such research, reaching maturity could easily take decades.
For the best hope of moving faster on risk management, research could be complemented by another approach to developing risk management practices: early release and iteration. This approach can be seen in AI companies’ if-then commitments,3 which are often relatively vague, lack extensive justification, and are explicitly marked as early, exploratory, or preliminary.4 Commitments like these are a sort of minimum viable product. Rather than polished commitments grounded in extensive and unassailable research, they are initial attempts at risk management that a company can try, notice problems with, iterate on, and continually improve as more information and research come in.
This early-release-and-iteration approach is unlike how risk management tends to look in other, more mature industries. Rather, it is more similar to how AI companies develop and deploy their products. For a fast-moving industry where pragmatism and dynamism are key, this approach may be the best hope of developing workable risk reduction practices fast enough to reduce the bulk of the risk.
For this approach to work, it will be important for its practitioners not to confuse it with traditional risk management in mature industries, nor with intensive research isolated from practice. Risk management practices that come from an early-release-and-iteration approach will frequently be under-explained and under-justified and will be later revised to accommodate new developments or improved understanding. Scholars and other critics will be tempted to focus their critiques on the lack of rigor, but it might be more productive to focus critiques on other matters, such as the frequency with which companies revise their frameworks and whether they list and eventually resolve key open questions.
Policymakers, rather than choosing between imposing detailed regulations and waiting for risk management to mature, can aim to accommodate and encourage the fast development of risk management practices and the continuous revision of them.
The Challenge of Rigorous Risk Assessment for AI
In some industries, it is common for operators to perform regular, extensive risk assessments. One example is nuclear power; the U.S. Nuclear Regulatory Commission uses probabilistic risk assessment to put numbers on potential risks.5 Risk assessment for nuclear plants focuses on a specific, limited set of risks: those that could cause damage to the nuclear reactor core, resulting in the release of radioactivity.6
Risk management in other industries tends to have a similar quality. For example, approval from the U.S. Food and Drug Administration generally requires empirical studies of a drug’s effects on predefined indicators of health, including positive efficacy and negative side effects.7
By contrast, AI risk, as it is understood today, presents both a broader and a vaguer surface area of potential risks. AI is a technology that could potentially automate anything a human mind can do, and it is advancing rapidly. AI has been the subject of a vast set of concerns, including but far from limited to the manipulation of public opinion, automation of cyber operations, invasions of privacy, proliferation of the capacity to produce and deploy biological and chemical weapons, labor market impacts because of AI’s economic competition with much of the population, amplification of bias, and “loss of control,” which refers to the possibility that AI agents could autonomously work to disempower humans.8 Discussions of these risks tend to emphasize that some of them are speculative, poorly understood, and/or the subject of vigorous disagreement among experts.9
For many of these, attempts at risk management face many hard questions. Consider one example risk: that AI could assist in chemical and biological weapons production.10 To assess and manage this risk, one would ideally like well-grounded answers to questions including: what aspects of weapons production (and/or acquisition) can AI systems enhance? For what types of weapons, and for what types of actors? How much could AI systems help each type of actor of concern with each type of weapon of concern? How can one know which AI systems are capable of such enhancement? What technological measures can be used to ensure that actors of concern can neither elicit assistance with weapons production from AI nor steal AI model weights and fine-tune them for their own purposes?
It is especially hard to get reasonable answers to these questions given that the concern is about hypothetical future AI systems rather than present ones. There are no empirical examples of such AI systems to study, no case studies for such AI-assisted incidents, no statistics that can be used to straightforwardly estimate frequency, and no relevant high-assurance AI safety programs that can be studied to produce standards.
The fact that we do not have such things does not mean that concerns about the risks are unfounded. AI systems have recently been approaching human-expert-level performance on many fronts at once,11 and if they were to reach parity with top experts on chemistry and biology, they could quickly and dramatically expand the set of people able to produce weapons of mass destruction.12 There has been significant concern about these risks from policymakers.13
But getting empirically grounded, thoroughly quantified answers to the questions above may prove intractable until after AI systems that clearly pose the risks in question exist, at which point the risks may be significant.
More generally, understanding and achieving some level of consensus of the full scope of risks, and developing solid and widely used risk management practices around this understanding, could take decades. This would be in line with the history of risk management in other industries.14
Some have advocated that AI development should be delayed until (and unless) risk management becomes mature enough to provide high assurance against risk.15 Others have pointed to the immature state of risk management as reason to delay regulation while AI development moves forward with no restrictions.16 Either approach might sound reasonable at first blush but looks less appealing (and less realistic) when keeping in mind how long the road could be to achieve mature risk management practices.
Risk Assessment With Ambition, Urgency, and Even Impatience
The companies developing cutting-edge AI systems are not delaying production or release of products while they work to assemble rigorous, comprehensive analysis about their systems’ capabilities, internal workings, and revenue potential. They are, rather, building and releasing AI products with ambition and urgency.
Indeed, the culture of tech companies in general tends to prioritize an ethos of rapidly releasing products and iterating on them, rather than aiming to perfect them—an approach that means products are often limited at a given point in time but that results in fast feedback and improvement.17
The idea of prioritizing rapid iteration over up-front analysis is critical for some of the key players in AI. In addition to informing approaches to products, it has also featured prominently in some of the leading AI companies’ statements of their philosophy for navigating the risks of AI.18
Can this ethos be applied to the development of risk management, as well as to the development of AI itself?
To a significant degree, this is exactly what has been happening with if-then commitments released by major AI companies over the past year or so, although the pace of iteration could be faster, and the number of companies participating could be greater.19
For example, in early 2024, Google DeepMind released its “Frontier Safety Framework,”20 which lists AI capabilities it intends to test for and enhanced risk mitigations that could be required depending on the results of testing. In its announcement, it explicitly highlighted that the framework is preliminary and a starting point for iteration:
The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.
The framework itself contains significant ambiguities and areas that will need further refinement over time. The “critical capability levels” it aims to test for are described at a high level and are based on what is explicitly called “preliminary analysis.” For example, it tests for “[AI systems] capable of fully automating opportunistic cyberattacks on organizations with a limited security posture.” A “future work” section of the document fully acknowledges its preliminary nature and lists a number of hopes for future versions of the framework.
Other if-then commitments have similar properties. OpenAI’s “Preparedness Framework” is marked “Beta” and described as a “living document.”21 When discussing its capabilities of concern, it states:
As mentioned, the empirical study of catastrophic risk from frontier AI models is nascent. Our current estimates of levels and thresholds for ‘medium’ through ‘critical’ risk are therefore speculative and will keep being refined as informed by future research.
Anthropic’s initial announcement of its “Responsible Scaling Policy” stated, “We want to emphasize that these commitments are our current best guess, and an early iteration that we will build on. The fast pace and many uncertainties of AI as a field imply that, unlike the relatively stable BSL system, rapid iteration and course correction will almost certainly be necessary.”22 It has since put out a revised version of its policy, noting many changes that were made to achieve more flexibility after getting experience with implementation.23
One could complain—and some have—that these if-then commitments are overly vague and lack many helpful features of risk management in more mature industries.24 But as of today, the alternative to preliminary, exploratory commitments is not rigorous, reliable commitments—it is more likely to be essentially holding off on risk management until there is a lot more clarity on the risks.
These companies are taking the same approach to risk management that they take to AI systems themselves: build something, try it out, and improve it over time. But there is room for them to do more and with a faster pace of iteration. Early if-then commitments alluded to the need for further work and called out multiple areas for improvement, including aspirations to add oversight from independent third parties.25 But there have been few public updates or revisions to these policies as of today.26 And many more companies have not yet released if-then commitments at all.27 Calls for such if-then commitments to meet some absolute standard of rigor may be less productive than calls for consistent, publicly visible progress and iteration.
Can Regulation Take a Similar Approach?
A company can put out a voluntary if-then commitment, then publish any number of revisions and refinements as it learns from feedback and implementation. It is much harder for a government to take an approach of putting out regulations early and revising them over time. Every change to legislation presents a new political battle and new set of compromises and complexities, perhaps with a changed balance of power among coalitions since the last time a relevant bill was passed. Assigning an agency to make and revise regulations is itself a hard-to-reverse action, giving a particular set of people discretion and powers that could require a political battle to remove.
Still, it is worth considering how policymakers can balance urgency and uncertainty when it comes to AI regulation. Some options include:
- Proposing laws that nudge or compel companies to develop and publish risk management practices with certain high-level qualities, such as requirements for third-party audits, while leaving the details up to the companies themselves.
- Creating agencies and/or government institutes, including AI safety institutes, with the budget and ability to hire qualified people, but whose powers are constrained. (For example, such an institute could publish reports and advisories that are not subject to hard enforcement.) These organizations might be empowered or disempowered later, depending partly on how much credibility they end up with.
- Drafting laws that provide high-level shaping of what lawsuits can be brought in cases of actual or imminent harms and high-level guidance for how courts should approach them, while still leaving ultimate judgments up to courts that can consider details of each specific case.
None of these approaches is foolproof, but if done well, such approaches could help push forward the development of both private risk management practices and the state capacity to eventually enforce them, while avoiding getting stuck with requirements based on immature ideas about risk management.
Aiming for Unprecedented Progress
ChatGPT may have set the record for having the fastest-growing user base of all time.28 Indeed, a defining feature of today’s progress in AI is how fast it has been—a source of both excitement and concern about the technology.
If AI continues to progress with unprecedented speed, AI risk management ideally will too. Making that happen could require a messy, iterative process, with if-then commitments and/or unpolished regulations that are not initially grounded in thorough, rigorous research (and require many revisions). Implementing imperfect risk management practices could indeed be the fastest way to gather data and get to the point where thorough, rigorous research is possible.
Concerns that “the science surrounding AI safety is still in its infancy” are valid.29 But if these concerns lead to holding off on any risk management practices until the underlying science is settled, they could mean that the science remains in its infancy too long. Pushing forward the maturation of AI risk management should be treated as an urgent priority—on par with pushing forward the development of AI itself.
The author is married to the president of Anthropic, an AI company, and has financial exposure to both Anthropic and OpenAI via his spouse.
Acknowledgments
This piece has benefited from a large number of discussions over the years, particularly with people from METR, the UK AI Safety Institute, Open Philanthropy, Google DeepMind, OpenAI, and Anthropic. For this piece in particular, I’d like to thank Chris Painter and Luca Righetti for comments on a draft.
Notes
1For example, see the Statement on AI Risk open letter; declarations from international attendees of the UK AI Safety Summit and AI Seoul Summit; and the White House’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.
2Zoe Lofgren, “Letter to Scott Wiener,” U.S. House of Representatives, August 7, 2024, https://democrats-science.house.gov/imo/media/doc/8.7.24%20to%20Senator%20Wiener.pdf.
3Holden Karnofsky, “If-Then Commitments for AI Risk Reduction,” Carnegie Endowment for International Peace, September 13, 2024, https://carnegieendowment.org/research/2024/09/if-then-commitments-for-ai-risk-reduction?lang=en.
4For example, Google DeepMind introduces its “Frontier Safety Framework” with the following: “The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.”
OpenAI’s Preparedness Framework is marked “beta” and describes itself as a “living document” (page 1).
Anthropic’s blog post introducing its “Responsible Scaling Policy” states, “However, we want to emphasize that these commitments are our current best guess, and an early iteration that we will build on. The fast pace and many uncertainties of AI as a field imply that, unlike the relatively stable BSL system, rapid iteration and course correction will almost certainly be necessary.”
5“Probabilistic Risk Assessment (PRA),” U.S. Nuclear Regulatory Commission, July 7, 2020, https://www.nrc.gov/about-nrc/regulatory/risk-informed/pra.html.
6See the definitions of “Level 1” and “Level 2” risk assessment via “Probabilistic Risk Assessment (PRA),” U.S. Nuclear Regulatory Commission.
7For an overview of the process, see “Development & Approval Process | Drugs,” U.S. Food and Drug Administration, August 8, 2022, https://www.fda.gov/drugs/development-approval-process-drugs. Many clinical trials are pre-registered at clinicaltrials.gov.
8“International Scientific Report on the Safety of Advanced AI,” AI Seoul Summit, May 2024, https://assets.publishing.service.gov.uk/media/66f5311f080bdf716392e922/international_scientific_report_on_the_safety_of_advanced_ai_interim_report.pdf.
9The following examples are from “International Scientific Report on the Safety of Advanced AI,” AI Seoul Summit:
Page 42: “The overall impact of disinformation campaigns in general as well as the impact of widespread dissemination of general-purpose AI-generated media are still not well understood. Despite indications of potentially serious risks to public discourse, and the integrity of the information ecosystem posed by general-purpose AI, there are caveats.”
Page 47: “The degree to which current state-of-the-art general-purpose AI systems enhance the capabilities of malicious actors to use the life sciences over existing resources, such as the internet, remains unclear. Though some empirical work has assessed this uplift with respect to information access and biological threats, additional studies evaluating a broader range of tasks and scientific domains are needed to provide greater insight into this question.”
Page 51: “Some experts believe that loss of control scenarios are implausible, while others believe they are likely, and some consider them as low-likelihood risks that deserve consideration due to their high severity. This expert disagreement is difficult to resolve, since there is not yet an agreed-upon methodology for assessing the likelihood of loss of control, or when the relevant AI capabilities might be developed.”
10For previous discussion of this risk, see Karnofsky, “If-Then Commitments for AI Risk Reduction.”
11For example, see “Test Scores of AI Systems on Various Capabilities Relative to Human Performance,” Our World in Data, April 2, 2024, https://ourworldindata.org/grapher/test-scores-ai-capabilities-relative-human-performance.
12For more discussion, see Karnofsky, “If-Then Commitments for AI Risk Reduction.”
13See section 1.3 of the Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity and Section 4.4 of the White House’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.
14For example, see the histories of the U.S. Food and Drug Administration and the U.S. Nuclear Regulatory Commission.
15From “Policymaking in the Pause,” Future of Life Institute, April 12, 2023, https://futureoflife.org/document/policymaking-in-the-pause: “We recommend third-party auditing of such systems across a range of benchmarks for the assessment of risks, including possible weaponization and unethical behaviors and mandatory certification by accredited third-party auditors before these high-risk systems can be deployed. Certification should only be granted if the developer of the system can demonstrate that appropriate measures have been taken to mitigate risk, and that any residual risks deemed tolerable are disclosed and are subject to established protocols for minimizing harm.”
16Lofgren, “Letter to Scott Wiener.”
17For example, see the first section of Paul Graham, “The Hardest Startup Lessons to Learn,” PaulGraham.com, April 2006, https://paulgraham.com/startuplessons.html. Similar ideas appear in Eric Ries, The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses (Crown Currency, September 2011).
18From OpenAI: “We currently believe the best way to successfully navigate AI deployment challenges is with a tight feedback loop of rapid learning and careful iteration. Society will face major questions about what AI systems are allowed to do, how to combat bias, how to deal with job displacement, and more. The optimal decisions will depend on the path the technology takes, and like any new field, most expert predictions have been wrong so far. This makes planning in a vacuum very difficult.”
From Anthropic: “We believe that methods for detecting and mitigating safety problems may be extremely hard to plan out in advance, and will require iterative development. Given this, we tend to believe ‘planning is indispensable, but plans are useless.’ At any given time we might have a plan in mind for the next steps in our research, but we have little attachment to these plans, which are more like short-term bets that we are prepared to alter as we learn more.”
19Karnofsky, “If-Then Commitments for AI Risk Reduction.”
20Anca Dragan, Helen King, and Allan Dafoe, “Introducing the Frontier Safety Framework,” Google DeepMind, May 17, 2024, https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework.
21“Preparedness Framework (Beta),” OpenAI, December 18, 2023, https://cdn.openai.com/openai-preparedness-framework-beta.pdf.
22“Anthropic’s Responsible Scaling Policy,” Anthropic, September 19, 2023, https://www.anthropic.com/news/anthropics-responsible-scaling-policy.
23“We have learned a lot in our first year with the previous RSP in effect, and are using this update as an opportunity to reflect on what has worked well and what makes sense to update in the policy. . . . We learned two valuable lessons to incorporate into our updated framework: we needed to incorporate more flexibility into our policies, and we needed to improve our process for tracking compliance with the RSP.” From “Announcing Our Updated Responsible Scaling Policy,” Anthropic, October 15, 2024, https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy.
24Simeon Campos, “Responsible Scaling Policies Are Risk Management Done Wrong,” Less Wrong, October 25, 2023, https://www.lesswrong.com/posts/9nEBWxjAHSu3ncr6v/responsible-scaling-policies-are-risk-management-done-wrong; and Atoosa Kasirzadeh, “Measurement Challenges in AI Catastrophic Risk Governance and Safety Frameworks,” Tech Policy Press, September 30, 2024,
25See page 6 of Dragan, King, and Dafoe, “Introducing the Frontier Safety Framework”: “We are exploring internal policies around alerting relevant stakeholder bodies when, for example, evaluation thresholds are met, and in some cases mitigation plans as well as post-mitigation outcomes. We will also explore how to appropriately involve independent third parties in our risk assessment and mitigation processes.”
See page 15 of “Anthropic’s Responsible Scaling Policy”: “Due to the large potential negative externalities of operating an ASL-4 [a future AI Safety Level] lab, verifiability of the above measures should be supported by external audits.”
And see page 25 of “Preparedness Framework (Beta),” OpenAI: “Scorecard evaluations (and corresponding mitigations) will be audited by qualified, independent third-parties to ensure accurate reporting of results, either by reproducing findings or by reviewing methodology to ensure soundness, at a cadence specified by the SAG and/or upon the request of OpenAl Leadership or the BoD.”
26Anthropic released an updated Responsible Scaling Policy a little over a year after it published its original. OpenAI and Google DeepMind have not yet published updates to their initial policies.
27But sixteen companies have committed to do so. See “Frontier AI Safety Commitments, AI Seoul Summit 2024,” GOV.UK, May 21, 2024, https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024.
28Krystal Hu, “ChatGPT sets record for fastest-growing user base - analyst note,” Reuters, February 2, 2023, https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01.
29Lofgren, “Letter to Scott Wiener.”