Cloud Reassurance Project: Interim Report

Paper

Cloud Reassurance Project: Interim Report

By Ariel (Eli) Levite and John Pendleton

Published on Jun 12, 2023

Introduction

Cloud computing services play an integral role in enabling a range of commercial activities, public services, and critical infrastructure. As more organizations choose to access a range of IT services from cloud providers, rather than relying solely on their own on-premise infrastructure, cloud services have become woven into the societal fabric and are growing increasingly essential to the services we use; the screens that inform and entertain us; the food we buy or have delivered; the planes and trains we ride in; the cars we drive; and the healthcare we seek, along with countless other aspects of our daily lives. Cloud computing is a central part of the economy, with companies projected to spend $576 billion on cloud computing in 2023.¹ And by the end of 2026, cloud services are projected to account for over two-thirds of all computing and storage infrastructure.² The fundamental design of evolving technologies, such as self-driving vehicles and artificial intelligence, will rely on the power of the cloud to access, communicate, collect, store, process, analyze, and share information and knowledge. Given the cloud’s vast and growing benefits, its importance to our collective future is limited primarily by imagination.

The benefits of our broad shift to cloud services are many and diverse.³ They have allowed greater efficiency in information technology (IT) operations and have empowered individuals and organizations to undertake assignments that were previously costly, difficult, and in some cases entirely out of reach. Technical architecture and physical resources that twenty years ago would have taken months to provision and put together today can be accessed from cloud providers within minutes. In addition to offering new possibilities for greatly expanding the breadth of their services and the reach of critical functions, cloud services have also provided societies and economies with a much greater capacity to handle adversity, as seen during the coronavirus pandemic and related disruptions. There are also significant security gains that come with delegating security of cloud infrastructure to experienced cloud service providers that can bring scope, reach, and economies of scale to bear on the management of risk, although both the providers and their customers share the overall responsibility of risk mitigation.

As more organizations consolidate the management of their IT services to cloud providers, prudence would advise cloud-dependent societies to examine and understand the potential and likelihood of a major disruption or distortion of cloud services, and how it might impact the range of organizations that rely on these services. Notwithstanding the massive investments that providers of cloud services have made to build their operational security and resilience, disruptions of cloud services do occur. Thankfully, most of these disruptions have been resolved to date with relatively little impact on the users. A large-scale disruption of a major cloud provider (or providers) with cascading impacts has not occurred, but the possibility of such an event cannot not be ruled out. It is reasonable, given the stakes involved, to examine the probability of, consequences from, and means to address potential large-scale disruption, regardless of its likelihood. (Indeed, there is no material harm in doing so). Such an event could, in principle, be caused by a wide range of hazards, including technical failures, malicious actions caused by insiders or external actors, or natural disasters such as earthquakes, tsunamis, and hurricanes.

The overall imperative for interdependent societies is to foresee potential problems as best they can in hopes of preventing or managing them effectively and smoothly recovering from them. Because cloud services are an increasingly indispensable part of the interconnected mesh now holding commerce and society together, it is wise to regularly assess how to minimize their susceptibility to disruption.

This is why stakeholders, including cloud users, government regulators, and insurers, grapple with the question of risk associated with the use of cloud services. They seek enhanced confidence that our widespread cloud dependency can be maintained and expanded in a reliable and sustained way, and that adequate measures can be put in place to address major setbacks that might nonetheless occur, no matter how unlikely the scenario. Toward this goal of increasing confidence and trust and enhancing societal resilience to withstand technological and other shocks, the Carnegie Endowment for International Peace established this Cloud Reassurance Project, which brings together a wide range of stakeholders to explore the potential for systemic risks in cloud dependency, review measures that are deployed to mitigate them, identify shortfalls that might exist, and suggest ways to address them. Participants include prominent cloud service providers, leading reinsurers (providers of insurance to insurers) operating in the domain, major providers of technological services for cloud utilization, and a company specializing in assessing cyber risks. In addition, the project includes senior independent cyber experts in the technical and policy realms, including former government officials. The group is advised by a leading law firm with expertise in multi-stakeholder initiatives; this firm also guides and monitors the working group’s proceedings to ensure that these conform to established antitrust practices.

This interim report provides a midway update with a framing of the key issues being examined in the project, with a final report due in the fall of 2023. The interim report identifies five areas that project participants considered significant to cloud reassurance:

Cloud risk. Describes the increased adoption of and reliance on cloud services and the potential for creating systemic risks—such as a localized shock or stress that could instigate a ripple effect through connected entities, potentially creating widespread impacts.
Risk scenarios. Analyzes potential scenarios that could challenge the reliability, robustness, and resilience of cloud services at scale with an eye toward identifying potential mitigation and remediation measures by providers, consumers, governments, and societies writ large and gaps in those arrangements.
Insurability. Explores the myriad challenges of making significant cyber risks more insurable, as well as the role that governments could play in unlocking the potential of insurance to enhance risk management.
Policy considerations. Considers factors that must guide any regulatory efforts to enhance trust in cloud services, as well as alternatives to regulation that might enhance such trust.
Resilience. Discusses the importance of being able to remediate and quickly recover from disruptions and explores such factors as transparency, communications, cooperation, and contingency planning that are key to achieving resilience and enhancing trust in cloud service provisions.

Taken as a whole, this project explores whether there are any scenarios that might create cascading, systemic risks, and examines the role that can be played individually and collectively by pertinent private sector entities (technology service providers, consumer groups, [re]insurers, and even capital markets) to bound these risks and mitigate their effects in the unlikely event that a major disruption or distortion occurs. Along the way, the project also explores scenarios and boundary conditions beyond which private sector enterprises by themselves cannot realistically be expected to handle such adversity and where governmental intervention might be needed. And it examines the various levers that might be used to enhance trust in the capacity of cloud services to manage and remediate risk, including voluntary commitments and arrangements to enhance resilience, (re)insurance, and regulation.

Systemic Cloud Risks

Although the term is usually associated with finance, systemic risk can be created or magnified in all complex, interconnected systems where a localized shock or stress could instigate a ripple effect through connected entities, potentially creating widespread impacts or even a contagion of problems.

In a highly interdependent world, problems may cascade temporally, geographically, or across sectors. While not a cloud-specific example, in 2021, a ransomware attack on the Colonial Pipeline Company’s billing system led the company to shut down its pipeline for nearly a week, resulting in fuel shortages on the East Coast of the United States, public anxiety, and a declaration of national emergency. Tellingly, uncertainty about the scope of the intrusion and its effects led Colonial to institute precautionary measures, thereby expanding or at least prolonging the effects of the intrusion itself. In another case, a router malfunction in 2022 resulted in a nineteen-hour service outage of Rogers Communications, one of the largest telecommunications and internet service providers in Canada. Nationwide, countless critical structures dependent on the services—including banks, cell phone and landline phone connections, and medical and emergency response networks—were unavailable to millions. Around 25 percent of Canada lost internet connectivity, taking an estimated toll of $142 million USD from the Canadian economy.⁴

Growing cloud dependence is owed to multiple economic and technical factors. More and more users are finding that it is financially and technologically more efficient to use cloud services for many of their information technology needs rather than continuing to invest primarily in their legacy on-premises IT capabilities. Thus, users are critically dependent on the availability, integrity, and confidentiality of cloud services—sometimes from a single provider—and users often cannot effectively work offline if they cannot access information and software that is being hosted remotely. In addition, the massive investments required to establish, maintain, secure, and enhance provision of cloud services have led organizations to gravitate toward the services of a few providers that have the capacity and technical capabilities to meet individual organizations’ needs. A handful of these providers, called “hyperscalers,” have a dominant share of the worldwide cloud market. Figure 1 below provides a simplified portrayal of the transformations brought about by systemic dependence on cloud services.

A thorough evaluation of the potential for systemic risk requires consideration of a full range of possible hazards. Such hazards range from natural disasters to technical failures to human errors or even malicious attack, such as intentional sabotage of undersea communications cables. Disruptions of various types are possible at multiple levels both within the control of cloud providers and outside it. For example, an unintentional software coding error or hardware problem in a single cloud provider’s system could impact many of its users; a code problem common across multiple providers would likely have an even more dramatic impact. Threats can also come from outside the cloud services provider, and past experience has shown that supply chains can be particularly vulnerable; in 2020, state-based hackers exploited automatic updates to SolarWinds, a software used to manage information technology, and gained access to the information of thousands of customers, reportedly including sensitive government systems.⁵ Providers rely on energy and telecommunications infrastructure, so interruptions to power or the internet could cripple their ability to provide services. Natural disasters could impact large data centers or key communications facilities—as happened at the Fukushima nuclear power plant when a tsunami struck in 2011.

Reassuringly, many of the potential concerns related to cloud services may not materialize because the service providers have already taken measures to address them. Others may still occur but will produce no more than transient and modest effects, in part due to these very same advance preparations. Yet it is crucially important to identify residual risk scenarios that could at least theoretically rise to the level of systemic risks, assess their possible impact, contrast them with the measures in place to diminish such prospects and recover from them if they occur, and highlight shortfalls if any. This effort is underway, and its findings will be reported in this project's final report. For completeness, this effort will also consider other risks associated with cloud provisions that are more modest in scale but that could nonetheless have disturbing consequences, such as an outage that lasts a short period but affects a critical sector like healthcare. The ultimate aim is to justifiably raise overall confidence in cloud-based services.

Importantly, risk should not be thought of as contained to a single organization’s operation; it must be framed in terms of risks to the interconnected system. This project strives to identify such systemic risks and seeks to generate meaningful and viable risk mitigation strategies that commercial owners, operators, and customers could take, as well as to identify any catastrophic risks that may require assistance from governments.

2. Risk Scenarios

To explore systemic risks arising from cloud reliance, the project is undertaking a scenario-based technical analysis. The technical analysis focuses on logical disruption or distortion of service risk scenarios that could produce significant impact, but that are amenable to risk mitigation. Beyond this purely technical analysis, the project is also considering additional scenarios that fall within the categories of disruption to physical infrastructure and disruption to post-event recovery/resilience and that might have the potential for significant impacts. We do this because from society’s perspective, what happens after a disruptive event may be as important as (or even more important than) the event itself.

Analytical Considerations

Weighing cloud risk in context. Focusing on cloud risks that could cascade into systemic problems, disruptions, or disasters should not blind us to the benefits cloud adoption has brought to global cyber resilience. The innovation and massive investments made to build, sustain, upgrade, and secure cloud services have improved cyber resilience well beyond what most businesses, governments, and individuals would experience without the cloud. Going forward, cloud services will use even more advanced and resilient system architectures and enable wider adoption of more secure architectures like “zero trust” that should further boost confidence in reliability.⁶ In short, there are good reasons why businesses, governments, and individuals continue to increase their dependence on the cloud.
Adapting an all-hazards approach. Trust in cloud services depends on their capacity to withstand and, if need be, recover from and remediate adverse effects of all types of perils. Thus, beyond those dangers posed by malevolent intent, it is important to recognize the vulnerability consequent upon human error, design flaws, technical failures, and natural disasters. As previously discussed, all these triggers, in conjunction with increasing cloud reliance, have the potential to create systemic risks.
One size does not fit all. Cloud services benefit a range of commercial, public, and governmental customers. Services are delivered to those customers in different ways, based on diverse technical, commercial, and operational choices. For example, some services are offered through managed service providers who utilize cloud services but typically have on-site personnel and infrastructure, or through cloud service providers who use remote hosting to provide services such as storage infrastructure, platforms, and software. The diversity and complexity of cloud services means that there are few common modes of failure but also that there are not many uniform solutions that can enhance all cloud services across the board.
Comprehensive approach to risk assessment. To assess the full impact of risk scenarios and gauge their potential to produce systemic risks, we must go beyond considering the impact of service unavailability and reflect also on compromises of confidentiality and integrity (regardless of origin) as well as on the cumulative effects of cloud disruptions, not least on the trust in these services. Moreover, for each risk scenario, multiple factors need to be considered, such as duration, “blast” radius, difficulty of recovery, and financial impacts, as well as the implications for the performance of critical infrastructure.
Cloud services resilience requires a shared-fate approach. Although attention is naturally focused on the role of cloud service providers, other players also affect resilience—and can set it back if not effectively integrated into resilience culture and schemes. Therefore, this project aims to shed some light on the intricate relationships among players in this realm and to formulate expectations around the roles all the pertinent stakeholders could play in sustaining dependable and resilient cloud services.

Selected Risk Scenarios

After consideration of these factors, four potential logical disruption risk scenarios have been identified as the focus of the project’s technical analysis. In theory all seem to have the potential to trigger systemic cloud risks. These risk scenarios include:

Vulnerability in common code. Systemic risk could be triggered by malevolent intent (as in a supply chain attack), but could also stem from error or failure of automation.
Compromise of credentials. Systemic risk could result if privileged access is obtained by non-authorized parties.
Availability loss. Systemic risk could result if services or information become unavailable from either malevolent or unintended configuration failure.
Connectivity problems. Systemic risk could be triggered by lasting loss of telecommunications, a power outage, or an architecture issue.

Building on current work cataloguing existing mitigations to these risks at individual unit risk scale, going forward, the project aims to review and prioritize technological strategies that can mitigate the risks identified in these scenarios at the accumulated catastrophic scale. This will be complemented by a consideration of other potential setbacks—be those triggered by exogenous events or by developments within the cloud environment. Such events may or may not represent systemic risks but nevertheless could arouse concerns among customers and policymakers.

3: Insurability of Cloud Risks

Insurance is a valuable tool for managing risks and reducing uncertainty, and since its emergence in modern form in seventeenth-century London, it has continually evolved to meet the changing risks confronting businesses and individuals. Insurers have recognized that they must provide cover for digital risks if they are to remain relevant to their clients, and cyber insurance has been among the fastest-growing segments of the global insurance market in recent years. However, the amount of insurance available for cyber risks remains small compared to many of the more established types of insurance.⁷ Globally, most businesses and individuals are uninsured or significantly underinsured against cyber risks.⁸ This is concerning because insurance can deliver a range of significant economic and social benefits:

Incentivizing risk management by pricing risk, applying conditions for insurability, and informing clients of their relative risk exposure;
Mitigating losses, through financial compensation and practical assistance after an event;
Promoting investment and innovation by improving confidence in long-term planning and collaboration with external parties;
Sharing practices to reduce risks that emerge from the claims process;
Allowing more efficient capital allocation by reducing the reserves needed to cover losses; and,
Highlighting risks that cannot be addressed by the private sector alone—thereby incentivizing consideration of complementary arrangements that might be necessary to address those risks.

Historically, the cyber insurance market has encountered growing pains for several reasons beyond simply the price or even the perceived value of the product.⁹ The explosion of ransomware has resulted in significant unanticipated losses. Hostile acts attributed to nation-states, like NotPetya and the SolarWinds hack, have blurred the lines around what constitutes an act of war or terrorism, which are typically not covered by insurers. As a result, insurers are developing frameworks to put boundaries around their exposure in the event of an extreme cyber catastrophe, clarifying the provision of coverage in a number of specified major cyber catastrophe scenarios. Policy exclusions provide increased clarity around coverage terms in cyber war events as well as for cyber events as triggers for loss that are not explicitly included, often referred to as “silent cyber” risk.¹⁰

Cloud technology is challenging to insure because of its cross-sectoral and global scope, its dependence on cloud services, and the concentration of users and providers. These features combine to significantly drive up the accumulation risk and make it difficult to diversify exposure. The limited visibility into rapidly evolving cloud operations, as well as the extent to which entities depend on those operations, further complicates risk assessment. Addressing this requires insurers to work with both customers and cloud providers to gain information to understand and insure the risks. Moreover, insurers and reinsurers have been traditionally reluctant to extend coverage to infrastructure and business interruption from outages in power, utilities, and internet service providers, so any cloud service dependence on such utilities complicates provision of insurance. Additionally, credit rating agencies, which generate benchmarks that investors use to judge the creditworthiness of businesses (including insurance companies), have voiced concerns about the uncertainty and potential magnitude of business interruption or other losses connected to cyber risks.

Recognizing all this, insurers and cloud providers are nevertheless striving to narrow the gap between insurance demand and supply for cloud-dependent businesses and have taken steps to systematically identify and quantify cloud risks. In 2017, Lloyd’s of London estimated that a cloud disruption lasting three days could generate economic losses in the United States, the United Kingdom, Canada, and the European Union of as much as $53 billion.¹¹ And in 2018, Lloyd’s published a scenario-based estimate that a week-long outage of a single cloud provider would generate up to $19 billion in economic losses in the United States.¹² Lloyd’s has since added a cloud outage to its roster of stress test scenarios that insurers must estimate their losses against.¹³ Reinsurers have developed their own scenarios as well as making use of third-party analytic tools to improve their ability to quantify and model cyber risks in general, and cloud risks in particular.¹⁴ Major cloud providers have also begun to partner with (re)insurers to develop joint solutions for cyber risks. However, cybersecurity scholar Josephine Wolff, who has studied efforts by policymakers and insurers to create joint initiatives, concludes that “these discussions have ultimately accomplished very little beyond highlighting the disconnect between what insurers view as their role in the cybersecurity ecosystem and what policymakers view as the role of cyberinsurance.”¹⁵

This project aims to narrow further the disconnect between insurance and the major technological platforms by systematically examining and quantifying the impact of scenarios that could produce systemic risks as well as highlighting mechanisms that could help bound these risks. The hope is to unlock at least some of the potential of (re)insurance to play a role in this space like that it has been playing historically in other sectors and against other risks. In addition, the project seeks to foster much-needed understanding among stakeholders about the role of governments in managing risk. In 2022, a study requested by the U.S. Congress recommended that the U.S. Treasury Department consider a federal insurance response for “catastrophic cyber incidents,”¹⁶ noting that answers were needed for critical questions like how such a mechanism would be funded and how much it would cost taxpayers; what would be the loss level and conditions that would trigger the backstop; and how a program would avoid creating a moral hazard or underwriting inappropriate risk-taking. This prompted the Treasury Department to issue a public request for comments in late 2022, but it is unclear if or when further action is likely to be taken. This project seeks to feed into this line of inquiry by considering the maximal potential of (re)insurance to cover cloud risk as well discussing any quid pro quo—such as improvements to cybersecurity and cyber resilience—that would be required to make some form of backstopping arrangement viable and politically palatable.

4: The Policy Challenge 

Given the centrality of cloud services in critical societal functions, pressure for government to require security and resilience from cloud computing enterprises is mounting. Some sectoral regulators have tried to apply their authority to cloud services that form the supply chain for the sectors falling within their remit—banking and other financial services, for example. However, policymakers face challenges in assessing and addressing the potential risks in part because rapidly evolving cloud technology is difficult to understand and they have limited levers at their disposal.

In the United States, the recently released U.S. National Cybersecurity Strategy emphasized the need to “identify gaps in authorities to drive better cybersecurity practices in the cloud computing industry . . . and work with industry, Congress, and regulators to close them.”¹⁷ In March 2023, the U.S. Federal Trade Commission published a request for information seeking input about, among other things, the extent to which certain segments of the economy are reliant on cloud service providers, and the security risks associated with the industry’s business practices.¹⁸

The United States is not acting alone. In April 2023, an international group including Australia, Canada, Germany, the Netherlands, New Zealand, the United Kingdom, and the United States jointly published guidance intended to urge manufacturers and software developers to design products that are “secure-by-design/secure-by-default” to address vulnerabilities as products are developed and deployed.¹⁹

Carnegie’s previous work has found that “given how tough some of the policy and regulatory challenges are likely to be, many issues associated with cloud governance will likely be addressed only partially, slowly, and suboptimally.”²⁰ This section therefore explores challenges in designing an effective regulatory framework, as well as considerations for nonregulatory, alternative approaches to enhance trust in cloud services.

Challenges in Considering Regulation

According to internationally agreed-upon best practices, regulatory policy is defined as the process by which government decides whether to use regulation as a policy instrument to achieve a policy objective and proceeds to draft and adopt a regulation through evidence-based decisionmaking. Setting norms among the cloud community—a collective expectation for the proper behavior of actors with a given identity—is a key step in reassuring stakeholders and potentially lessening pressure for regulation.²¹ As regulation is pursued, would-be designers of an effective framework for cloud services would face several challenges as described below.

Defining desired regulatory outcomes. National governments’ core interest in cloud computing (beyond safeguarding the competitiveness of this marketplace) is to reduce the likelihood of a systemic failure that has a cascading impact on critical functions or national security—essentially, addressing “significant adverse effects.” Assuring the resilience of cloud infrastructure, therefore, is a shared responsibility and an imperative for national and international interests. However, the need for increased transparency surrounding cloud service provider operations is a frequent concern. This issue was central in a February 2023 Treasury Department report, which expressed concern about the opacity of cloud operations amid the concentration to fewer providers; the report concluded that cloud services had the potential to improve resilience and security but also represented a significant risk to the financial system.²²

Overlaps, gaps, and seams of relevant authorities. Existing regulatory approaches are often tailored to their respective sectors, and may not fully encompass cloud services, which now underpin most critical infrastructure sectors.²³ The security requirements that do exist are often associated with other outcomes, such as the European Union’s General Data Protection Regulation, which concerns privacy, consumer protections, and contract requirements for doing business with national governments.²⁴ Expanding regulatory authority over cloud computing risk, moreover, could incentivize industries that benefit from cloud services to push requirements onto the cloud providers. This could prove to be difficult to implement effectively, and the cloud providers could reasonably choose not to accept those requirements and exit parts of the market.

Harmonizing national and international efforts. The challenge of national versus international requirements creates a potential regulatory labyrinth. The United States has focused primarily on cybersecurity, especially in protecting critical national infrastructure. For its part, the European Commission has actively promoted a harmonized approach across Europe for a wide range of digital services, such as via the European Alliance for Industrial Data, Edge and Cloud, but individual nations have their own country-specific regulations. Europe has also been active in promoting data privacy. As a result, cloud providers face vastly differing requirements in different nations. Failure to harmonize regulations increases compliance costs on companies that operate across national borders and could limit their ability to compete in some markets.

Designing effective regulations. In principle, regulation should seek to balance costs and benefits and be sufficiently rigorous to reassure regulators, consumers, and other stakeholders. Regulation should not create barriers to innovation or impose undue burden of oversight, and it should adhere to the principles of open government, notably transparency and broad-based participation in the regulatory process. Once designed, regulation should consider risk-based approaches in regulatory compliance and enforcement strategies to minimize the enforcement (as well as compliance) cost for the maximum compliance benefit. For regulation to be judged, relevant measures of progress are needed to gauge progress, adjust, and ensure that desired outcomes are achieved.

Given the challenges in pursuing regulation, it seems prudent to also explore the extent to which a nonregulatory approach might reassure stakeholders about cloud services—including whether and how cloud services might be provided in consultation and cooperation with pertinent government agencies. At minimum, exploring nonregulatory options could yield insights for any contemplated regulatory actions by identifying what the private sector can voluntarily commit to on its own.

Considerations for a Nonregulatory Approach

Nonregulatory approaches are often voluntary and involve a combination of information sharing, strategic and operational collaboration, and targeted incentives for risk mitigation. Enhanced voluntary collaboration can improve resilience and reassure stakeholders. More clearly defining how cloud services enable critical infrastructure across sectors would likely be a necessary policy aim. Below are some initial ideas that the study is exploring.

Opportunities to share information. In the United States, hyperscale cloud service providers are currently active in industry-led Information Sharing and Analysis Centers and public-private collaborations, particularly the Joint Cyber Defense Collaborative operated by the Cybersecurity and Infrastructure Security Agency. Nevertheless, there is not a cloud-specific information-sharing venue, nor is standard information available that could be used to help customers, insurers, and governments assess the individual, sectoral, or societal level of resilience necessary to build more accurate risk models or point at options for redressing deficiencies. Better sharing (which would require appropriate legal and other protections) could help facilitate risk management across the cloud ecosystem.

Partnering with insurance and industry groups. Steps to improve cloud reassurance could be enhanced if they are taken with the active participation of the insurance industry, which historically has been a key driver of effective risk management markets. Another consideration could be to expand on the private-sector shared responsibility model developed in concert with the Cloud Security Alliance²⁵ to demonstrate the steps cloud providers are taking toward building cloud resilience and developing auditable standards that could help markets evaluate resilience without active government intervention.

Looking to other sectors for ideas and lessons learned. The energy and transportation sectors have proactively and voluntarily addressed stakeholder concerns through industry-led trust building, cooperation, and mutual assistance arrangements, some of which might offer useful insights for this project. Significant differences exist between cloud service providers and other sectors; notwithstanding these differences, the project will consider whether such arrangements offer potential lessons learned to reassure stakeholders, including governments, about the robustness and resilience of cloud services.

5: Resilience

To date, concerns pertaining to the risk associated with cloud dependency have tended to center around the security of cloud services. Notwithstanding how important cybersecurity is to trustworthiness of cloud services, most cloud-related security incidents that have taken place have been traced back to non-malicious origins, including natural events, design choices and flaws, and operator errors. Furthermore, attention has focused predominantly on the role of cloud service providers in preventing such events (and their presumed culpability when they do not). The role of other stakeholders, including customers, in maintaining and enhancing the robustness of their cloud dependency has been discounted, as has the importance of mitigating the risks associated with recovery and remediation if the cloud is disrupted.

With these concerns in mind, this project aims to shed light on the systemic resilience of the cloud. Given the scope and scale of economic and public dependency on cloud services, this effort would ultimately need to involve not merely major providers but all stakeholders with a vested interest in systemic stability, as well as users of cloud services. Hence the final report of this project will offer some reflections on a shared-fate-and-expectation framework to reassure all those who depend on the reliability and credibility of cloud services.

A good point of departure is to reach a common understanding of resilience in the cloud context. Generally speaking, resilience is the ability of an entity or unit—for example, an asset, organization, community, or region—to anticipate, resist, absorb, respond to, adapt to, and recover from a disturbance.²⁶ This includes “the ability to withstand and recover from deliberate attacks, accidents, or naturally occurring threats or incidents.”²⁷ Resilience requires understanding the criticality of a business process, the capability of the underlying technology, the business impact if the technology fails, and the organization’s risk tolerance.²⁸ See figure 2 below for a depiction of a resilience model over time.

The ultimate objectives of resilience are to withstand and recover rapidly from disruptions, but strategic system design and advance preparations for incidents enable resilience as well. This preparatory work must do more than simply design in and mandate controls to protect the security of services and the supporting infrastructure (in this case, the cloud). It must also anticipate how the cloud-enabled system could fail and affect different cloud-dependent customers and services, then develop robustness, recovery, and redundancy capabilities that can be deployed if necessary. The responsibility for resilience does not fall only on the providers of services; customers who make use of cloud services share the responsibility for ensuring resilience in how they design and operate their own software applications and services that connect with or run on cloud infrastructure.

Accepting that adverse events can happen is foundational. Designing technology and processes at both the provider level and the customer level to remediate and quickly recover should be the goal. Perhaps the most important aspect of cloud reassurance is making sure that any scenario that does occur does not have significant enduring impact, recognizing that recovery from disruptions of some cloud-based services could take far longer than merely restoring cloud services or even rebuilding trust in them. An outside-in view of resilience can be useful because systems can be resilient even when some parts are vulnerable. Even so, for broad systemic resilience, capabilities and contingency plans need to be maintained and regularly updated not only by the cloud service providers but also by those who provide cloud-based services. Given the reliance on cloud services, the responsibility for resilience must ultimately be shared.

Achieving resilience in a primarily cloud-based environment would require:

Transparency about operational risks such as outage scenarios that may impact customers. The complexity of technology and related business practices and the pace at which they are evolving put an onus on those who deploy the services to take the lead in informing and educating others about risk and options available for them for risk mitigation. Given the sensitivity of some of this information, however, such information exchanges might require special arrangements to ensure that information is used in enhancing resiliency but remains beyond reach of nefarious actors.
Shared understanding among the various stakeholders on their respective roles and responsibilities in enhancing resilience of cloud services.
Communications between providers, customers, and policymakers about risks to services and potential incidents as well as measures that could help with mitigation and consequence remediation. Such information exchange should include lessons learned from previous cloud-related events and be conducted globally.
Planning for customers to build resilience into their operations as well as maintaining the continuity of their essential functions even in the face of degraded cloud infrastructure or incidents, as well for cooperation among major providers of cloud services to mitigate such adversity.
Functioning insurance and reinsurance markets to offset economic risk associated with cloud service incidents.

As the Cloud Reassurance Project continues, we seek to identify concrete elements within each of these domains that can improve systemic resilience. To do this, we will focus on identifying where residual systemic risks might dwell, and how to mitigate or remediate them. Ultimately, the aim is to enhance trust in the resilience of cloud services to make it possible to fully reap their myriad benefits, while reassuring all stakeholders that our collective dependency can be carried out responsibly without endangering economic and social welfare.

Annex: The Cloud Reassurance Project

The universal adoption of cloud-centric operating models is bringing enormous benefits to every sector of the global economy, yet the ubiquity of dependence on common technologies and service providers also creates a new potential for systemic risk. Regulators and policymakers have expressed concern about the possibility of widespread, cascading effects if cloud services are disrupted, but the nature of the risks and the balance of responsibilities in addressing them remain highly uncertain.

In response, the Carnegie Endowment for International Peace launched the Cloud Reassurance Project—an initiative to create shared understanding among private sector stakeholders of the systemic risks associated with deepening global dependence on cloud technology.

The Cloud Reassurance Project brings together cloud service providers, enabling technology providers and (re)insurers. The project is funded by the participants, who are represented on technical and policy working groups. These groups include independent experts who bring deep knowledge of cloud computing and technology governance. Legal advisors ensure that all activities are compliant with antitrust requirements and, when required, provide guidance regarding confidentiality.

The project is investigating questions such as:

Evaluating the risk. What are the key features of a systemic event resulting from a disruption of cloud services? What are the plausible yet severe scenarios of most concern? Which services create the most significant accumulations of risk?
Bounding the risk. How can technical mitigations limit potential systemic impacts from cloud disruptions? How can stakeholders identify concentrations of risk, quantify it, and incorporate it in risk governance mechanisms?
Managing the risk. What is the appropriate balance of responsibilities between technology providers, users, regulators, and (re)insurers? How can this project create shared understanding of the risk among these stakeholders?

The project is overseen by a steering group and relies on policy and technical experts from a range of cloud service providers including VMWare, Microsoft, Google Cloud, and Amazon Web Services; reinsurance companies Swiss Re and Munich Re; and experts from Axio, Exiger, and the Center for Internet Security, among others.

Acknowledgments

Carnegie would like to acknowledge a range of contributors to this report.

The cloud service provider, insurer, and other expert participants have been instrumental in crafting and refining this report. Particular thanks to Bob Kolasky and Peter Armstrong for their contributions to the Risk Scenarios and Policy Challenges sections, as well as to Nick Beecroft, who lent his considerable expertise to the project, particularly in the Insurability section.

John Pendleton (john.pendleton@ceip.org) led the development of the report and served as our primary editor with assistance from Gabriella Mesce (gabriella.mesce@ceip.org). The Cloud Reassurance Project is led by Ariel (Eli) Levite. The team wishes to offer special thanks to George Perkovich. Isabella Furth and Anjuli Das reviewed the draft and offered many helpful suggestions. Amanda Branom and Jocelyn Soly provided excellent assistance with our graphics.

Notes

¹ Belle Lin, “Technology Chiefs Seek Help Wrangling Cloud Costs,” Wall Street Journal, March 3, 2023, https://www.wsj.com/articles/technology-chiefs-seek-help-wrangling-cloud-costs-61ba0b50.

² “Cloud Infrastructure Spending Closes Out the Fourth Quarter and 2021 with Strong Growth, According to IDC,” press release, International Data Corporation, March 31, 2022, https://www.idc.com/getdoc.jsp?containerId=prUS48998722.

³ For the purposes of this study, cloud services are defined as the delivery of different IT services through the internet, with a specific focus on public cloud services (a type of computing where resources are offered by a third-party provider via the internet and shared by many organizations and individuals who want to use or purchase them).

⁴ Divya Rajagopal and Ismail Shakil, “Rogers Network Resuming After Major Outage Hits Millions of Canadians,” Reuters, July 8, 2022, https://www.reuters.com/business/media-telecom/rogers-communications-services-down-thousands-users-downdetector-2022-07-08/.

⁵ Chris Jaikaran, “SolarWinds Attack—No Easy Fix,” Congressional Research Service, January 6, 2021, https://crsreports.congress.gov/product/pdf/IN/IN11559.

⁶ Zero trust architecture (ZTA) is a cybersecurity approach that authenticates and authorizes every interaction between a network and a user or device—in contrast to traditional cybersecurity models that allow users or devices to move freely within the network once they are granted access. ZTA works on the principle of "never trust, always verify" and assumes that attacks will come from within and outside of the network. “Science & Tech Spotlight: Zero Trust Architecture,” U.S. Government Accountability Office, November 18, 2022, https://www.gao.gov/products/gao-23-106065.

⁷ “Cyber Insurance: Risks and Trends 2022,” Munichre.com, May 16, 2022, https://www.munichre.com/topics-online/en/digitalisation/cyber/cyber-insurance-risks-and-trends-2022.html.

⁸ “Cybersecurity Insurance Reports,” Cybersecurity and Infrastructure Security Agency, December 17, 2020, https://www.cisa.gov/publication/cybersecurity-insurance-reports; “Incentives and Barriers of the Cyber Insurance Market in Europe,” European Union Agency for Cybersecurity, June 28, 2012, https://www.enisa.europa.eu/publications/incentives-and-barriers-of-the-cyber-insurance-market-in-europe; “UK Cyber Security: The Role of Insurance in Managing and Mitigating the Risk,” UK Government and Marsh Advisory, March 23, 2015, https://www.gov.uk/government/publications/uk-cyber-security-the-role-of-insurance.

⁹ Jon Bateman, “War, Terrorism, and Catastrophe in Cyber Insurance: Understanding and Reforming Exclusions,” Carnegie Endowment for International Peace, October 5, 2020, https://carnegieendowment.org/2020/10/05/war-terrorism-and-catastrophe-in-cyber-insurance-understanding-and-reforming-exclusions-pub-82819.

¹⁰ “Sustainable Cyber Insurance Markets,” Organisation for Economic Co-operation and Development, accessed April 27, 2023, https://www.oecd.org/daf/fin/insurance/building-a-sustainable-cyber-insurance-market.htm; Jamie MacColl, Jason R.C. Nurse, and James Sullivan, “Cyber Insurance and the Cyber Security Challenge,” Royal United Services Institute, June 28, 2021, https://rusi.org/explore-our-research/publications/occasional-papers/cyber-insurance-and-cyber-security-challenge.

¹¹ Counting the Cost: Cyber Exposure Decoded,” Lloyds.com, 2017, https://assets.lloyds.com/media/09c41bb0-d73b-4ae6-a2f8-d5cc41429998/pdf-emerging-risk-report-2017-counting-the-cost.pdf.

¹² “Cloud Down: Impacts on the US Economy,” Lloyds.com, January 23, 2018, https://www.lloyds.com/clouddown.

¹³ “Realistic Disaster Scenarios (RDS),” Lloyds.com, accessed April 27, 2023, https://www.lloyds.com/conducting-business/underwriting/realistic-disaster-scenarios/.

¹⁴ Simon Cartagena, Visesh Gosrani, Jasvir Grewal, and Justyna Pikinska, “Silent Cyber Assessment Framework,” Institute and Faculty of Actuaries, 2019, https://www.actuaries.org.uk/system/files/field/document/FINAL%20Sessional%20paper%20-%20Silent%20Cyber%20Assessment%20Framework.pdf.

¹⁵ Josephine Wolff, Cyberinsurance Policy: Rethinking Risk in an Age of Ransomware, Computer Fraud, Data Breaches, and Cyberattacks (Cambridge: MIT Press, 2022), 182, https://doi.org/10.7551/mitpress/13665.001.0001.

¹⁶ “Cyber Insurance: Action Needed to Assess Potential Federal Response to Catastrophic Attacks,” U.S. Government Accountability Office, June 21, 2022, https://www.gao.gov/products/gao-22-104256.

¹⁷ “Biden Administration Releases Wide-ranging National Cybersecurity Strategy,” ABA Banking Journal, March 2, 2023, https://bankingjournal.aba.com/2023/03/biden-administration-releases-wide-ranging-national-cybersecurity-strategy/.

¹⁸ “FTC Seeks Comment on Business Practices of Cloud Computing Providers that Could Impact Competition and Data Security,” press release, U.S. Federal Trade Commission, March 22, 2023, https://www.ftc.gov/news-events/news/press-releases/2023/03/ftc-seeks-comment-business-practices-cloud-computing-providers-could-impact-competition-data.

¹⁹ “Shifting the Balance of Cybersecurity Risk: Principles and Approaches for Security-by-Design and -Default,” Cybersecurity and Infrastructure Security Agency, April 13, 2023, https://www.cisa.gov/sites/default/files/2023-04/principles_approaches_for_security-by-design-default_508_0.pdf.

²⁰ Ariel (Eli) Levite and Gaurav Kalwani, “Cloud Governance Challenges: A Survey of Policy and Regulatory Issues,” Carnegie Endowment for International Peace, November 9, 2020, https://carnegieendowment.org/2020/11/09/cloud-governance-challenges-survey-of-policy-and-regulatory-issues-pub-83124.

²¹ Martha Finnemore, “Cybersecurity and the Concept of Norms,” Carnegie Endowment for International Peace, November 30, 2017, https://carnegieendowment.org/2017/11/30/cybersecurity-and-concept-of-norms-pub-74870.

²² “New Treasury Report Assesses Opportunities, Challenges Facing Financial Sector Cloud-based Technology Adoption,” press release, U.S. Department of the Treasury, February 8, 2023, https://home.treasury.gov/news/press-releases/jy1252.

²³ In the United States, the Federal Risk and Authorization Management Program provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services bought by the government.

²⁴ Ben Wolford, “What Is GDPR, the EU's New Data Protection Law?” GDPR.eu, accessed April 27, 2023, https://gdpr.eu/what-is-gdpr/.

²⁵ “Welcome to the Cloud Security Alliance,” Cloud Security Alliance, accessed April 27, 2023, https://cloudsecurityalliance.org.

²⁶ J. L. Carlson et al., “Resilience: Theory and Application,” Argonne National Laboratory, Report 10.2172/1044521 (January 2012), https://www.researchgate.net/publication/254992944_Resilience_Theory_and_Application.

²⁷ “Presidential Policy Directive: Critical Infrastructure Security and Resilience,” press release, White House, February 12, 2013, https://obamawhitehouse.archives.gov/the-press-office/2013/02/12/presidential-policy-directive-critical-infrastructure-security-and-resil.

²⁸ Jim Boehm, Wolfram Salmanian, and Daniel Wallance, “A Technology Survival Guide for Resilience,” McKinsey & Company, March 24, 2023, https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/a-technology-survival-guide-for-resilience.

Iran Global Governance Technology

Carnegie does not take institutional positions on public policy issues; the views represented herein are those of the author(s) and do not necessarily reflect the views of Carnegie, its staff, or its trustees.

More Work from Carnegie Endowment for International Peace

Article
South-South AI Collaboration: Advancing Practical Pathways
The India AI Impact Summit offers a timely opportunity to experiment with and formalize new models of cooperation.
Lakshmee Sharma, Jane Munga
Article
What Can the EU Do About Trump 2.0?
Europe’s policy of subservience to the Trump administration has failed. For Washington to take the EU seriously, its leaders now need to combine engagement with robust pushback.
Stefan Lehne
Commentary
Carnegie Politika
Russia’s Unspoken Condition for Ending the War Is Zelensky’s Resignation
Insisting on Zelensky’s resignation is not just a personal vendetta, but a clear signal that the Kremlin would like to send to all its neighbors: even if you manage to put up some resistance, you will ultimately pay the price—including on a personal level.
Vladislav Gorin
Commentary
Emissary
With the RAISE Act, New York Aligns With California on Frontier AI Laws
The bills differ in minor but meaningful ways, but their overwhelming convergence is key.
Alasdair Phillips-Robins, Scott Singer
Commentary
Carnegie Politika
Japan’s “Militarist Turn” and What It Means for Russia
For a real example of political forces engaged in the militarization of society, the Russian leadership might consider looking closer to home.
James D.J. Brown

More Work from Carnegie Endowment for International Peace

Article

South-South AI Collaboration: Advancing Practical Pathways

The India AI Impact Summit offers a timely opportunity to experiment with and formalize new models of cooperation.

Lakshmee Sharma, Jane Munga

Article

What Can the EU Do About Trump 2.0?

Europe’s policy of subservience to the Trump administration has failed. For Washington to take the EU seriously, its leaders now need to combine engagement with robust pushback.

Stefan Lehne

Commentary

Carnegie Politika

Russia’s Unspoken Condition for Ending the War Is Zelensky’s Resignation

Insisting on Zelensky’s resignation is not just a personal vendetta, but a clear signal that the Kremlin would like to send to all its neighbors: even if you manage to put up some resistance, you will ultimately pay the price—including on a personal level.

Vladislav Gorin

Hochel stading behind a dais, with a hand raised

Commentary

Emissary

With the RAISE Act, New York Aligns With California on Frontier AI Laws

The bills differ in minor but meaningful ways, but their overwhelming convergence is key.

Alasdair Phillips-Robins, Scott Singer

Commentary

Carnegie Politika

Japan’s “Militarist Turn” and What It Means for Russia

For a real example of political forces engaged in the militarization of society, the Russian leadership might consider looking closer to home.

James D.J. Brown