Source: iStock

commentary

Compute for India: A Measured Approach

Carnegie India recently organized a closed-door meeting in New Delhi to discuss India’s approach to compute in the context of its national artificial intelligence (AI) strategy. The meeting was attended by government officials, technology executives, startup founders, and academic researchers. Carnegie India’s experts also attended meetings organized by technologists based in Bengaluru on their efforts to promote open access to compute. Below is a summary and analysis of the key takeaways from these meetings.

Published on May 17, 2024

Introduction

Compute, as we explain in our primer, is used to refer to many things—the capacity to perform complex calculations, specific hardware equipment like semiconductors, or as a unit of measurement expressed in floating-point operations per second (FLOPS) that quantifies a computer’s ability to execute high-performance tasks like machine learning.

A more holistic view of compute positions it as a technology stack comprising three layers—a hardware, a software, and an infrastructure layer. Collectively, this forms what has come to be known as the “compute stack,” which may include:

  • Advanced chips (GPUs, TPUs)
  • Specialised software to run the chips (compute unified device architecture or CUDA)
  • Data centers and network infrastructure (Google, AWS, Azure)
  • Data storage and management software (Oracle, IBM, SAP)
  • Machine learning frameworks and programming languages (PyTorch)

Compute is central to the IndiaAI mission. In March 2024, the Cabinet allocated Rs. 10,372 crores ($1.3 billion) for the mission, nearly half of which, about Rs. 4,568 crores, has been earmarked to build compute capacity across the country. This demonstrates the importance of compute to India’s growing AI ambitions.

According to an internal document prepared by the Ministry of Electronics and Information Technology (MeitY), India’s compute strategy has two key objectives:

  1. Scale up India’s compute capacity to meet growing local demand.
  2. Democratise access to AI by providing affordable compute in key sectors.

While the details remain to be worked out, specific elements of the strategy include:

  • Building a national AI compute capacity of at least 10,000 graphic processing units (GPUs).
  • Public-private partnerships (PPP) to build and deploy compute capacity.
  • Subsidizing access to compute in priority sectors.

In “India’s Compute Conundrum,” we explained the current state of play in India and raised a question—In the context of India’s use-case-led AI strategy, should the approach be to optimize national compute capacity, maximize it, or should it be to do both? In this piece, we explore the meaning of compute in the Indian context, its current and proposed applications, and the different approaches to building and accessing compute in India.

Key Questions

National strategies on compute have multiple dimensions—technical (capacity, scale efficiency,); conceptual (goals, definitions); economic (markets, competition, business models, investments, access), and geopolitical (security, sovereignty, trade, sustainability).

For India, we believe there are three fundamental issues to consider at this stage:

  1. What is compute?
  2. Who is using the existing compute, for what, and through what means?
  3. How much compute capacity does India need? How should it be made available?

Current Sentiment

In this section, we capture the contours of the compute debate in India by presenting the different approaches and arguments of technologists, researchers, and policymakers. Below are a few points that were raised with regard to the current state of the national AI strategy.

  1. India needs more compute to meet its AI ambitions: A country’s compute capacity can be measured by the total number of GPUs available, or in terms of FLOPS. Under the AI mission, the government intends to build compute infrastructure of 10,000 or more GPUs. According to recent data from the Department of Science and Technology, the National Supercomputing Mission (NSM) has a total compute capacity of 24.83 petaflops (1 petaflop = 1015 flops), at the cost of Rs. 1,218 crores. The official target for the NSM is 66 PF by 2025. Many technologists believe this is insufficient capacity to meet India’s strategic AI goals. The national target of 10,000 GPUs is conservative even compared to the capacity of individual companies (Meta will have compute capacity equivalent to 600,000 H100 chips by 2025). Moreover, building foundational models requires massive amounts of compute. It took OpenAI 3,640 petaflop/s-days to train GPT-3. In 2023, an expert committee set up by the MeitY recommended that national capacity be increased to “3,000 PF, with an inference farm of 2,500 PF, and edge compute system of 500 PF.” Even this target seems inadequate to build large models. Put simply, India will need more compute to meet its strategic goals in AI.
  2. A balanced approach is required across the compute stack: The government’s vision to acquire over 10,000 GPUs suggests a bias towards hardware investment. Technology and policy experts expressed concerns with this approach and suggested that the government’s compute strategy should be geared towards increasing the range of choices and the degree of autonomy available to Indian startups in designing their technical architecture. This requires public investments to be balanced across the compute stack—not just the acquisition of chips, but also investment in data centers, promotion of cloud adoption, and development of indigenous software required to build and maintain AI systems. A complimentary objective could entail incentivizing local innovation in both hardware and software to meet global demand for AI technologies.
  3. Hyperscalers are meeting current demand, but challenges remain: Global cloud service providers such as Microsoft, Google, and AWS make compute available to Indian customers through different models (public, private, hybrid) and pricing plans (pay-as-you-go, subscriptions). Hyperscalers also bundle compute with other services, such as data storage and security. These convenient plug-and-play models, combined with innovative services, customer support, and global presence, make hyperscalers the preferred option for startups and researchers in India. However, some stakeholders highlighted certain challenges. Startups find it difficult to migrate their data to a different cloud provider as their workloads change. Additionally, the use of discounts and “cloud credits” to acquire new customers is not always transparent.
  4. Access to compute can be enabled through digital public infrastructure (DPI): A group of technologists in Bengaluru have envisioned an innovative approach to increase compute access—leverage India’s experience in building DPI to create a DPI for compute. The goal of the “Open Cloud Compute” (OCC) project is to establish a network of micro-data centers based on interoperable standards. OCC would enable small businesses to discover various kinds of compute offerings and select from a bouquet of services to meet their unique requirements. Like other DPIs, the OCC’s discoverability features are enabled through open protocols rather than a centralized platform. As of May 2024, the OCC project has twenty-four partners, including Oracle, Dell, Tata, and E2E Networks.
  5. Local demand for compute remains unclear: The lack of reliable data on the utilization of compute has prompted some to speculate that the introduction of GPUs into the Indian market may not be met with expected demand. Anecdotal evidence suggests that today, most Indian startups use compute to deploy AI (also called “inference”), rather than to fine-tune or train foundation models, which are more compute-intensive. Some suspect there may be sub-optimal use of existing capacity in the NSM. Academics, however, have made clear that dedicated compute is needed for scientific research. The lack of reliable data on utilization makes it difficult to evaluate these claims. Another concern is that rapid innovation could make today’s most advanced chips obsolete in a few years, which may force Indian companies to look for foreign alternatives.
  6. The government should drive demand for compute: The government could increase local demand by defining priority-use cases in strategically important sectors. The National Strategy for AI highlights agriculture, healthcare, education, and e-governance as focus areas for AI intervention, but more can be done to spur demand. For example, the government could itself be a large customer of compute, especially for e-governance applications and non-commercial projects. Additionally, enabling regulation and offering financial incentives could also stimulate local demand and innovation.

Possible Pathways

Based on key points of convergence that emerged from these discussions, below are five recommendations for Indian policymakers as they develop and implement a national compute strategy.

  1. Define clear goals: The government has identified two goals for compute—to build national AI compute infrastructure and make it widely accessible. However, a clear articulation of the national objectives would be useful. Is creating a positive social impact the primary goal? Is self-sufficiency a separate goal? Does the government expect a financial return on its investments? What does “sovereign compute facility” mean? Defining the goals more clearly would help the private sector build models that protect national interests and guide investments from businesses, public institutions, venture capital firms, and non-profits. This would also enable policymakers to measure the impact of the national strategy.
  2. Study different approaches to building compute infrastructure: The government has stated that active participation from the private sector is critical to the success of the AI mission. Given resource constraints, a PPP model is the only viable option to build national compute infrastructure at scale. Therefore, it is imperative that Indian policymakers study different approaches to building, maintaining, and servicing compute facilities. These could include:

    - Global approaches to building compute infrastructure.
    - PPP models in strategically important sectors (airports, power, defense, etc.)
    - New models to enable access to compute, such as “Open Cloud Compute.”
    - Evaluating local demand for compute and future needs.
    - Policy measures to reduce the cost of running data centers.
  3. Identify and support priority use cases: The government should highlight priority use cases to drive local demand for compute, such as applications to detect diseases, increase agricultural productivity, promote linguistic diversity, and flood prediction. For example, the “Grand Challenge” run by the Defense Advanced Research Projects Agency (DARPA) under the United States Department of Defense has helped the U.S. government “accelerate the development of autonomous vehicle technologies that could be applied to military requirements.” The Indian government could run similar challenges in priority sectors and provide venture funding to winning contestants. Over time, these challenges could lead to the creation of a global repository of AI use cases. The government could also consider fast-tracking subsidies and vouchers to support startups and researchers.
  4. Build dedicated clusters for academic research and military applications: A separate compute cluster should be set up for special use cases. For example, academic institutions such as the Indian Institutes of Technology (IITs), the Indian Institute of Science (IISc), the Indian Institute of Science Education and Research (IISER), and the Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR) should have access to high performance computing (HPC) systems to support pure scientific research and knowledge sharing in areas such as astrophysics, genomics, and environmental studies. The AI mission does not address potential applications of AI in the military. Dedicated compute facilities with additional access, control, and security measures should be set up for such sensitive use cases, including cybersecurity, defense, and national disaster management.
  5. Adopt a balanced and holistic approach to compute: The government should reorient its approach to compute infrastructure by looking beyond national capacity in absolute terms and focus on the larger strategic priorities for India. Besides acquiring GPUs, public investments must be spread out across different layers of the compute stack to promote competition, consumer choice, national security, and global trade. More effort should also be spent towards developing a shared understanding of compute across geographies to pave the way for greater multilateral cooperation. Lastly, India’s compute strategy must be combined with other interventions in areas such as data access, skilling, and regulation to help achieve India’s larger AI ambitions.

Conclusion

India’s compute strategy is still in its infancy. There is some clear direction—the government intends to build national compute capacity of over 10,000 GPUs in collaboration with the private sector and make compute available to eligible partners at affordable rates. But there is a lot that remains to be worked out.

To begin with, a clear articulation of objectives would help inform deployment models and investment decisions. Reliable data on how compute is currently used in the country would also promote optimal use of national capacity, while innovation challenges can spur local demand in priority sectors. Finally, because compute is an important geopolitical issue, technologists, researchers, and policymakers should study different global models and adopt a measured approach that reflects India’s strategic priorities.