Source: Getty

The Missing Pieces in India’s AI Puzzle: Talent, Data, and R&D

This paper explores the question of whether India specifically will be able to compete and lead in AI or whether it will remain relegated to a minor role in this global competition.

Published on February 24, 2025

The analyses presented in this paper are based on developments up to February 6, 2025.

Introduction

The world is at a critical moment in the race for artificial intelligence (AI) leadership. As the global competition for leadership in AI heats up, the current trend is toward the concentration of data, capital, talent, and cutting-edge research in the hands of a few firms and even fewer countries.

The United States and China, the world’s two “AI superpowers,” are locked in what is being called an “AI arms race” for the faster development and adoption of AI.1 Firms in these countries are building newer applications—commercial as well as military—for global adoption. The January 2025 release of DeepSeek-R1, an open-source model developed by a Chinese AI start-up, sparked panic in the United States’ AI sector, serving as yet another example of the AI race heating up.2

At the same time, other countries—notably, India, Japan, France, Germany, the United Kingdom, Singapore, and the United Arab Emirates (UAE), among others—want to prevent such concentration and are charting their own AI strategies to compete in this arena. These countries are attempting to find ways to avoid being relegated to observer status in the global AI race.

This paper explores the question of whether India specifically will be able to compete and lead in AI or whether it will remain relegated to a minor role in this global competition. The paper argues that if India is to meet its larger stated ambition of becoming a global leader in AI, it will need to fill significant gaps in at least three areas urgently: talent, data, and research. Putting these three missing pieces in place can help position India extremely well to compete in the global AI race.

India’s national AI mission (NAIM), also known as the IndiaAI Mission, was launched in 2024 and rightly notes that success in the AI race requires multiple pieces of the AI puzzle to be in place.3 Accordingly, it has laid out a plan across seven elements of the “AI stack”: computing/AI infrastructure, data, talent, research and development (R&D), capital, algorithms, and applications.4

However, the focus thus far has practically been on only two elements: ensuring the availability of AI-focused hardware/compute and, to some extent, building Indic language models. India has not paid enough attention to, acted toward, and put significant resources behind three other key enabling elements of AI competitiveness, namely data, talent, and R&D. 

Without plugging in these missing pieces, India is likely to fall short of its stated ambitions. Only by building key strengths in data, talent, and research will India be able to compete in building AI models and applications, which can in turn help Indian entrepreneurs build valuable AI companies. In each of the subsequent sections, this paper breaks down the constraints or problems concerning these three elements in detail and also lays out a clear set of recommendations for India to elevate its AI strategy across those three dimensions.

First, India must double down on boosting its AI talent and build up an optimal mix of top-, middle-, and low-tier AI talent. While Indian information technology (IT) services firms and global AI majors will naturally do their bit to create an AI-enabled workforce by upskilling India’s existing IT services talent, doing so will not be enough to meet the country’s ambitions. India will need to attract, nurture, and retain cutting-edge, top-tier AI research talent to ensure that AI innovations for the world emerge from India. This paper suggests ways for India to achieve this goal.

Second, India must immediately build up “digital public data” to provide the “oil” for India-specific AI models and research. Despite being one of the largest smartphone, internet, and digital transactions markets in the world, American and Chinese firms have an advantage in this aspect vis-à-vis India. The vast majority of the digital data footprint of Indians is locked within platforms owned by global tech firms. This paper shows that to accelerate its unique data advantage, India needs to identify ways to proliferate multilingual data as well as other India-specific datasets. This can provide a differentiating factor for Indian large language models (LLMs) and small language models (SLMs) vis-à-vis their global counterparts.

Third, this paper argues that India must aim to become a leader in both cutting-edge AI research and the development of India-specific applications of AI. By enabling sufficient AI infrastructure through various public and private initiatives; attracting, nurturing, and retaining top-tier AI research talent; and accelerating the availability of large volumes of India-specific datasets, India can truly become a leader in cutting-edge AI research. This, in turn, will cause a proliferation of applications built on top of these enabling layers of infrastructure, top-tier research talent, India-specific data, and cutting-edge AI research. 

Ultimately, India’s AI strategy has to be crafted in the context of a challenging global AI landscape. Therefore, its “AI for All” approach—as outlined by India’s government think tank, NITI Aayog5—must be complemented with a “Competitiveness in AI” strategy. This paper lays out the case and the way forward for this fundamental change in approach.

The Missing Pieces in India’s AI Strategy: Filling the Gaps

With the launch of the national AI mission in 2024, India has laid out the framework for its approach to AI competitiveness. Early in March 2024, relying on many of the recommendations of the expert working group report of October 2023, the Indian cabinet approved a “comprehensive, national-level IndiaAI mission with a budget outlay of Rs. 10,371.92 crore” (approximately $1.3 billion) over a period of five years.6

To begin with, the Indian government has attempted to establish the basic foundations of the AI ecosystem. In the past year, it has therefore focused on filling gaps in the AI infrastructure. India’s Ministry of Electronics and Information Technology (MeitY) has facilitated procurement of AI chips and compute capacity, specifically, 10,000 graphics processing units (GPUs), to support India’s start-ups, researchers, and academics.7

Geopolitically, access to AI chips and compute was at the top of the agenda for many countries in 2024. This was mainly due to the heavy concentration of advanced chip capacity in Taiwan and the growing risk of a Chinese invasion of Taiwan.8 India, recognizing the geopolitical and geoeconomic risk, rightly placed emphasis on the AI compute dimension first.

But, as with many other countries, the emphasis on chips and compute has come at the expense of some of the other crucial elements of the AI stack. This has meant that the holistic strategy needed to build out the AI ecosystem has not been fully acted upon. India’s approach to AI competitiveness and its key pillars can currently best be characterized as a work in progress.

The following three sections argue why the three other elements of the AI stack—namely, talent, data, and research—need significantly greater focus from the various actors in India’s AI strategy. Each section delves deep into the specific element to identify the constraints or gaps India currently faces and suggests ways to overcome those gaps.

Talent

Talent in AI is touted as India’s key strength and should be one of the layers of the AI stack that India specializes in. Today, India has one of the largest pools of science, technology, engineering, and mathematics (STEM) talent. However, the talent dimension for AI and India’s talent gap in this area requires a more nuanced examination.

Breaking Down the Talent Gap

There are four problems that India needs to address on the talent front.

1. Overall shortage: By most estimates, countries around the world, including the United States, China, and Europe, as well as India, are facing a severe shortage of AI talent. This problem will only further exacerbate as demand booms in the coming years, but the training and education lags.9 The shortage is even more critical for India, given that AI presents a huge opportunity for the country. Without sufficient talent, India will not be able to capitalize on the opportunity fully. As AI and macrotrends enable globalization to shift from goods to services, India will also face a shortage of talent, despite being “a digital talent nation.”10

2. Talent Migration: The second issue for India to tackle is that of the migration of some of its best talent. On the talent aspect, the Indian government’s strategy documents simply recommend increasing the number and type of AI courses at different levels.11 But they fail to address, for example, the reasons why India’s top-tier AI talent often migrates outside India. Neither do the MeitY working group’s recommendations offer ways to reverse that flow.

As Figure 1 shows, top-tier AI talent (that is, talent doing cutting-edge AI research) being trained in India until the undergraduate level, presumably at institutions such as the Indian Institutes of Technology (IITs), ends up working in the United States or Europe after completing their graduate (or postgraduate) work. Losing some of its high-potential AI research talent is a problem that India needs to rectify immediately.

3. The Quality of Talent and the Need for Upskilling: While India undoubtedly has a large pool of STEM talent, employers always complain about the job readiness of a large portion of Indian engineering graduates.12 Fixing the quality element will also be critical; else, many of these engineering graduates will likely be unable to compete with AI for basic coding tasks.13 At the same time, the need to develop India’s existing IT talent into an AI-enabled workforce through upskilling will be a critical challenge for India’s tech and IT services industry.14

4. A Suboptimal Mix of Talent. AI talent can be sorted into three categories: top-tier (those conducting cutting-edge research, data scientists, and AI researchers), mid-tier (domain experts and application developers), and low-tier (project managers and implementers).15 For any country to become an AI superpower, it will, depending on its overall AI strategy, inevitably require an optimal mix of talent across all three tiers.

While some reports suggest that India is short on talent in all tiers, as per most estimates, India currently has predominantly low-tier AI talent.16 As Figure 2 suggests, India has a growing comparative advantage in medium- and low-expertise talent, given its fast-growing developer base in the AI/machine learning (ML) space, especially compared to the rest of the world. India is one of the top contributors to GitHub AI projects, a fairly reliable metric for the number of developers undertaking AI coding projects.17 The Indian developer community on GitHub is now the second-largest and fastest-growing one, and it is expected to overtake that of the United States by 2028.18

But clearly, as Figure 3 suggests, the top-tier talent is currently concentrated in the United States, China, and European nations. Given the deep research and innovation ecosystems that exist in these countries, without a major push, it will be an uphill task for India to build and retain top-tier AI talent. And without an optimal mix across all three tiers, India will struggle to become a truly cutting-edge global AI power. To meet this ambition, India must therefore pursue various approaches to nurture, attract, and retain top-tier or high-level AI talent to supplement its mid- and low-tier talent base.19

Key Recommendations

For India to fully leverage its “talent nation” tag in the great AI game, it will need to evolve strategies to address the four problems highlighted above. Most importantly, it will need to grow the size and quality of its AI talent pool further, develop an optimal mix of top-, middle-, and bottom-tier talent, and continue exporting talent but also retaining and building up its domestic talent pool. Some recommendations in this regard are as follows:

1. Augment India’s Overall AI Talent Pool: India should leverage the role global tech companies can play in this regard. For example, Microsoft CEO Satya Nadella has committed to training and equipping 10 million people with essential AI skills over the next five years.20 India needs to bring at least twenty-five to thirty such tech companies on board to take on similar tasks and ensure that a variety of companies are training different types of AI talent within India.

2. Build an Optimal Mix of Talent: India will need a plan to build the various types of talent needed.21 India’s talent strategy must have three layers: one, AI-related R&D scientists and researchers (the top-tier talent layer); two, AI developers, builders, and architects who are able to create AI technologies, systems, and applications in various sectors; and three, AI integrators with sufficient understanding of AI tools and solutions to be able to integrate them into their company’s workflows.22

3. Fix India’s Top-Tier AI Research Talent Gap: High-level AI talent usually finds its way to high-tech sectors or emerging tech sectors and tends to concentrate in regions that have the most conducive and dynamic ecosystems (that is, other talented professionals, academics, researchers, companies, funders, and engineers, among others). As Yann LeCun, Meta’s chief AI scientist, urged during a visit to India in October 2024, Indian PhDs and scientists also need to focus on AI research, not just engineering and development.23

The high-performing talent problem for India can only be solved by evolving the overall AI ecosystem, which includes research, talent, industry, academia, and policy.24 While this is a difficult task, it is not impossible. India needs to identify at least twenty-five to thirty universities that will train the next generation of top-tier AI talent.

In addition, India will need to focus on investing deeply in four to six centers of excellence in AI research through a combination of public and private sector sources. The Anusandhan National Research Foundation (ANRF) should also include a substantial AI pillar under its ambit.25 India must also seek to bring some of its top-tier talent (currently working in the United States or Europe) to these centers to complement India’s local AI research talent. Funded through a combination of sources, these centers should seek to offer globally competitive salaries, world-class infrastructure, and appropriate incentive structures for the AI researchers. These labs, like the Facebook AI Research lab set up in Paris a decade ago, would also help retain AI research talent in India.26

The MacroPolo Global AI Talent Tracker’s 2023 update suggests that India’s ability to retain its top-tier AI talent has grown since 2019. In 2019, while nearly all Indian AI researchers ended up pursuing opportunities outside of India, by 2022, almost 20 percent had chosen to stay in India.27 This demonstrates that retention of top-tier AI talent, though difficult, is achievable.

4. Growing the Medium- and Low-Level AI Talent: This will be a relatively easier task for India, as its IT services firms, global capability centers (GCCs), start-ups, and large corporations attract and train talent for the adoption and implementation of existing AI technologies, algorithms, and models. Indian IT services firms should set aggressive targets for upskilling their existing IT workers rapidly to ensure their continuing relevance for clients. 

5. Integrate AI Into the Education Curriculum at All Levels: India’s current strategy does emphasize increasing the number of AI courses at the K–12, graduate, and postgraduate levels. As MeitY’s IndiaAI 2023 expert group report rightly suggests, given the immediate short-term demand, industry and academia will need to collaborate to make sure that industry gets the specific types of trained talent it needs.28 Planning of academic programs cannot be done in isolation. Academic institutions need to coordinate closely with industry so that the former can design the necessary programs to impart the requisite skills anticipated by the latter over the next three to five years.29 For example, facing such a shortage, semiconductor companies in India are forging partnerships with academia to bridge this gap.30

6. Develop a Broad Set of AI Skills Across Disciplines: The MeitY expert group has suggested career path mapping to supplement the AI curriculum so that India can produce not just AI engineers, but also AI entrepreneurs, product managers, designers, researchers, and ethicists.31 For example, management schools can train their students to become AI product and program managers. Science and research institutes should focus on ensuring promising AI research careers for their students and produce AI researchers and data scientists. Engineering schools would produce AI/ML engineers and DevOps professionals. Schools and online edtech platforms can teach fundamental AI skills at scale. Social science schools should encourage the study of sociopolitical, philosophical, and economic implications of AI for India.

7. Attract the Best STEM Talent From the Global South: India should develop a visa policy that can attract the best science and engineering talent from its neighboring regions, including AI talent from regions such as South Asia, Southeast Asia, Central Asia, Eastern Europe, Africa, and the Middle East. India currently does not attract many high-skilled immigrants, but such a visa regime will also allow India to benefit from the brain drain occurring in these countries.

India’s G20 talent visa, announced in December 2024 and expected to come into effect in 2025, aims to attract top research scholars and fellows from G20 nations as a way to boost innovation in India. Such initiatives could help establish India as the “Silicon Valley of the East.”32 This could be similar to the International Entrepreneur Rule (IER), which allows the United States to attract those who “would provide a significant public benefit through their business venture.”33 Many European countries have also adopted a similar rule. Given its market size and vibrant start-up and venture capital ecosystems, India will be able to attract a decent portion of the AI talent that might otherwise flow to Western Europe and/or the United States.

8. Tackle the Challenge of AI-Induced Structural Unemployment Head-On: Lastly, India must keep in mind the potential for AI to cause structural unemployment across various sectors and types of jobs. While Western nations face a chronic shortage of human capital and labor, India is in the opposite situation.34 It must ensure that AI is leveraged for enhancing the productivity of its labor, not for replacing said labor and jobs.

Importantly, India’s policies for job creation must also focus on training, reskilling, and upskilling Indian professionals and college graduates for the field of AI. There will be a considerable global shortage of AI professionals globally, much like what happened in the field of software development as well as cybersecurity.35 If India’s training, education, and skills institutes can reorient themselves toward this emerging technology, the country can capture a bigger portion of the jobs that will be created globally.36

Data

Data ultimately lies at the core of AI and is the actual oil for AI algorithms and models. It is a necessary but not sufficient condition for the development of useful AI-based products, models, and services. Therefore, a clear, comprehensive strategy for having continued access to data is necessary for success. American companies such as OpenAI and Google already have access to vast amounts of data, both public and proprietary, that have been leveraged for training their AI models.37 Similarly, the Chinese have access to a huge amount of data, which they consider as a key advantage for them in the global AI race.38

On the data element of the AI stack, good-quality, India-specific data in the volumes needed to train LLMs has thus far not been made readily available to start-ups, researchers, and innovators. Even well-funded Indian AI start-ups such as Sarvam have pointed to this fundamental problem, due to which they have had to rely on synthetic data for training their models.39

Breaking Down the Data Gap

Indian companies, start-ups, and researchers find themselves disadvantaged against their global peers on the data element of the AI stack due to various reasons. For India, there are at least a few dimensions to the data problem.

1. Lack of Access to Large Volumes of Existing Data: To start with, Indian start-ups and researchers do not have access to the massive volumes of data that Google, Meta, Microsoft, and others have access to by virtue of being Big Tech firms with consumer- and business-facing global platforms. It is either impossible for others to access this data (since these firms might not share this data) or it is very expensive (if one tries to purchase this data through data brokers).

2. Sparse Unique or Proprietary Data Within India: Indian firms currently do not have access to unique or proprietary data that could give them an edge over these global platforms. This unique or proprietary data could refer to data specific to Indian consumers or businesses, such as data from Unified Payments Interface (UPI) transactions. It could also refer to data in Indian languages that is not on the internet yet. (The Indian language data on the internet can be scraped by Big Tech firms anyway.)

3. Siloed, Unstructured, and Poor Data: Though large amounts of data are being generated in India, given its growing digital public infrastructure (DPI) and its large digital user base, the data is either lying in silos or has not been tapped yet. For example, data on trade through Indian ports or data on the usage of Indian toll booths is available yet lying in silos. Moreover, in addition to a dearth of easily accessible data or data-generating platforms, India also lacks well-annotated, regularly updated, feature-rich datasets.40

4. Overreliance on the Government to Solve the Data Gap: The strategy outlined in the national AI mission seems overly reliant on the government to build and manage a data platform. The IndiaAI Datasets Platform, expected to go live in Jan 2025, will aim to build a platform where developers can access and use datasets sourced from the private and public sectors.41 The vision, as outlined by India’s National e-Governance Division (NeGD), is for India to build a platform similar to Hugging Face, a private, venture-funded global repository of datasets and open-source models.42

However, a single government-managed platform will have a low likelihood of solving the data gap. A cursory analysis of leading, cutting-edge AI research work, or applications or models being built by leading AI firms, will show that their data needs are exhaustive. A government-managed data platform, despite best efforts, is likely to be plagued with issues, such as non-exhaustiveness of data, unstructured or unannotated data sets, or just bad data.43

More thinking, therefore, will need to be done on whether the data element is best solved by the government or by private players or start-ups. The global AI datasets market is growing very rapidly,44 and Indian start-ups and researchers will need to find ways to plug into and access that data in cost-effective ways. Moreover, India needs a clear, long-term strategy to pool together massive volumes of Indic language data to allow Indian companies to build custom models and applications for Indian consumers and businesses.

Key Recommendations

Data for AI is probably the toughest problem eventually. Some have argued that “those who solve the data dilemma will win the AI revolution.”45 The solution lies in developing a long-term, creative strategy toward overcoming the “data disadvantage” that currently plagues Indian researchers, start-ups, and companies. This holistic strategy must incorporate a plan for data generation, access to globally publicly available and licensed data, improving data quality, multilingual data, and multimodal (voice-, text-, image-, and video-based) data.

A long-term strategic approach, rather than a short-term, tactical approach, will be needed to solve this constraint for India. India’s strength in DPI and public data commons should be leveraged and incorporated for data across all sectors (public, private, and academic).

1. Leverage Indian Consumer/Transaction Data: India has to figure out ways to access, unlock, and leverage the vast amounts of data its large internet user base is creating. India’s technology firms, including prominently its telecom, e-commerce, logistics, and fintech firms, generate immense amounts of multimodal data that is currently not easily accessible or leveraged for AI research or innovation. Firms such as Jio, Airtel, Flipkart, Zomato, Blinkit, Swiggy, Delhivery, MakeMyTrip, PhonePe, and many others are home to substantial repositories of data. The right regulatory frameworks and market-based approaches need to be developed to unlock this data for driving another wave of innovation, the way UPI has allowed for innovation in the fintech space.

2. Develop Multiple Data Marketplaces: India could develop platforms and protocols for sharing non-personally identifiable data—under the ambit of the necessary privacy, anonymity, and rules-based access norms—with AI entrepreneurs, researchers, and innovators. The India Datasets Platform could serve as a repository of data, but the government could also encourage the evolution of a broader set of marketplaces to solve this problem. In the United States, for example, several companies, such as SAP, Amazon Web Services (AWS), Databricks, Snowflake, and many others, operate such data marketplaces.46 This model could be more scalable for India’s needs, especially given such diverse data across the country.

3. Unlock Government Department Datasets: Similarly, data that is currently locked up within government departments (such as agriculture, health, finance, education, railways, civil aviation, and others) or other sources within the public sector should be opened up. Rural agricultural surveys, consumption surveys, flight and train data, toll booth data, UPI transaction data, health and medical data, trade (import and export) data, and many other datasets reside in government departments. The focus should be on taking unstructured datasets and making them available for use in LLMs and other AI-based applications, especially in sectors such as education, finance, healthcare, travel and logistics, and agriculture, among others.

An interesting example that has emerged recently in India, for example, is the Integrated Geospatial Data-Sharing Interface (GDI) set up under the National Geospatial Policy of 2022 by the Department of Science and Technology, Government of India, through the Geospatial Data Promotion and Development Committee.47 The GDI has compiled and made easily accessible datasets from various public and private partners pertaining to the sectors of agriculture, livelihoods, transportation, and logistics. Such sector-focused data platforms could serve as an interesting model for unlocking government data.

4. Scale Up Current Efforts: The government has already moved a few steps in this direction. In addition to the IndiaAI Datasets Platform, which helps provide easy access to public sector datasets, India’s Open Government Data Platform also hosts and provides application programming interface access to various datasets.48 However, even though a decent number of datasets have been curated to facilitate research and innovation, these initiatives are still in very early stages of execution. Moreover, these data platforms currently suffer from various issues. For example, they are not updated periodically, nor do they provide standardized data in easily readable forms.49

Much like China’s National Data Bureau (NDB), India had also proposed setting up the India Data Management Office to serve as India’s data regulator, as part of its draft National Data Governance Framework Policy.50 In addition, the MeitY IndiaAI expert working group report of 2023 also focused on operationalizing the India Datasets Platform and envisioned the establishment of data management units within each ministry/department.51

These efforts, while commendable, will need to be executed and scaled up significantly to yield results that are truly impactful. The commitment to providing past as well as real-time, updated, and complete data from various arms of the government must be genuine and driven from the top down. A half-hearted approach of putting selective data up on the platform will not serve any real purpose.

Appointing a chief data officer for India would help identify useful datasets across the government, establish ways to ensure good quality data streams, and streamline the efforts across ministries. Moreover, standard operating protocols need to be adopted and adhered to for ensuring data quality. An interesting benchmark that has been suggested for adoption by India is the European Union’s Metadata Quality Dashboard, which can help assess the quality of uploaded data in terms of metrics such as accessibility, interoperability, and usability.52

5. Build Up Repositories of Multilingual Data: The government is building open-source datasets in various Indian languages through Bhashini, an AI-powered translation system, with the goal of enabling the development of AI applications using these datasets.53 Bhashini seems to be progressing well and reported having clocked over 100 million inferences in September 2024 across its multiple applications and use cases.54 However, despite efforts to crowdsource multilingual data, training data for multilingual AI models is still scarce for mostly all Indian languages. India therefore needs to think of ways to collect, generate, and access multilingual data at scale.

A significant amount of multilingual data today is being generated on India’s telecommunications platforms (through voice, messaging, and the creation of digital entertainment content in local Indian languages).55 If India can set up processes to leverage this vast and continuously growing consumer-generated multilingual data for AI with the necessary regulatory guardrails, it could provide a big boost to India’s aim to build multilingual AI models.56

Of course, past regional radio, television, and newspaper records, once digitized and transcribed (potentially using AI itself), could also fill a big gap. This would be a more scalable way to have a deep pool of multilingual data than the on-ground crowdsourcing strategy suggested by some or by leveraging synthetic data as some other Indic-language models have done.57

6. Develop a DPI-Like Approach and Guardrails for Data Commons: Of course, the availability and sharing of data raise questions of privacy, consent, and security. Previous efforts at commercializing government data sets have been criticized for these lacunae.58 In August 2023, after several years of deliberations, India put in place a “modest and pragmatic” Digital Personal Data Protection (DPDP) Act, 2023, to enable data usage for lawful purposes by data fiduciaries while providing necessary protections to individuals or data principals over their data.59 These are commendable steps to improve India’s data capabilities within an effective data governance framework.

India should explore building out a DPI-driven approach to data as well. Data should be available as part of a digital commons to entrepreneurs to build applications on top of and not remain monopolized by just a few large firms. Data exchanges—with the necessary rules and guardrails—could serve as a key advantage for India in the global AI race. Therefore, the philosophical guiding values adopted in DPI by India, such as a consent-based architecture, and the appropriate data governance frameworks must also be incorporated in the design of these data exchanges.

R&D

Countries seeking to lead in the global AI race cannot ignore the R&D element of the AI stack. The United States, arguably the leader in AI innovation today, has clearly articulated R&D as a top priority for maintaining its global leadership in AI according to its detailed national AI R&D strategic plan, which was issued in 2016 and updated in 2019 and 2023.60

Similarly, China, way back in 2017, laid out a detailed stage-wise plan to become an AI R&D powerhouse by 2030.61 It envisioned starting from R&D in AI technology and applications in the first stage, followed by a research focus on basic AI theories in stage two, and finally a focus on advanced, cutting-edge AI research in the final stage. While it may not have invested as heavily as the United States, estimates still suggested that China had begun investing billions of dollars into AI R&D back then.62

India has not conducted such a long-term-oriented strategic exercise on building its AI R&D capabilities, nor has it invested anywhere near as heavily in R&D the way the United States and China have in the last decade. A country’s strength in AI-focused R&D can be measured both in terms of AI articles and journal publications and citations, but also patent applications and patents granted, not to mention the quality and quantity of its research talent pool. Various international studies have detailed out comparisons of R&D output–based rankings of nations.

Two key takeaways for India emerge from these reports. One, as Figure 4 shows, India is gaining ground vis-à-vis the United States and other countries (but not China) in terms of papers in AI and related fields published between 2014 and 2024. The growth has been exceptionally impressive since 2019.

The second key takeaway is possibly the more important one. On a more quality-based metric, that of patents granted in AI, India does not fare as well (see Figure 5). This reflects the real gap between the global AI leaders, the United States and China, and India.

As with the talent and data dimensions, on the R&D front, a clear identification of the constraints or gaps India faces in building a cutting-edge AI research ecosystem, as well as the ways in which those will be tackled in the short and long terms, is needed.63

Breaking Down the R&D Gap in AI

1. Low R&D Spending on AI (and Innovation in General) by India’s Private and Public Sectors: As per the data for 2020, the U.S. federal government now funds roughly 20 percent to industry’s 70 percent of total national R&D activity.64 In comparison, India’s private industry contributes to only about 36.4 percent of gross expenditure on R&D.65 Globally, commercial industries are demonstrably better than government departments at converting the same R&D dollars into functional products. We cannot expect that the reality would be drastically different in India. But unfortunately, the private sector’s share of R&D spending in AI is negligible today.

Similarly, the Indian public sector currently has negligible spending on AI R&D compared to its global counterparts. As a share of GDP, India’s R&D spending overall is approximately 0.6 percent, compared to other innovation-focused countries that typically average 3 to 4 percent of GDP.66 While exact figures are not available, India’s R&D spending on AI is likely even lower as a percentage of GDP compared to the United States, China, and other leaders in AI.67

The national AI mission has also largely allocated the majority of its approved outlay toward AI infrastructure (approximately Rs. 4,500 crore or approximately $ 515 million) and financing start-ups (approximately Rs. 2,000 crore or approximately $230 million). The funds allocated toward establishing centers of excellence (CoEs), and hence R&D, are approximately Rs. 990 crore (approximately $110 million).68 This needs to be corrected, and R&D spending on AI needs to be substantially increased by both the public and private sectors.

2. Dearth of Institutions Focused on AI R&D: The current problem on this front is that India is doing meager AI research compared to others. There are only a handful of well-endowed institutional platforms for research.69 This is leading researchers to migrate to the United States and Europe. None of India’s educational or research establishments make it to the top AI research institutions. By comparison, the Stanford AI Index Report, 2023, lists nine Chinese universities and research institutions in the top ten when ranked by the number of AI publications in all fields during 2010–21.70

3. AI Patents Falling Behind AI Research: India also suffers from low quality of research (using globally established metrics of quality and citations). As the publications-to-patents ratio shows, India’s AI patents are not keeping up with the quantity of its AI publications. India’s share of AI-relevant research publications has grown substantially in the last ten to fifteen years—on this metric, it ranked fourth globally for 2010–2019. Yet, its share of global patents in AI has not increased proportionately, and it ranked eighth globally on this metric for 2002–2019.71

In terms of citations (again, a metric of research quality), India’s rank drops down to fifteen. Even though international collaboration generally benefits research quality and impact, only 16 percent of India’s AI research papers published during 2010–2019 had non-Indian co-authors. This was the lowest level of international collaboration among the top ten AI research-producing countries.72

While other contextual factors (such as high costs involved in patenting, insufficient IP protections, and protracted patent litigations) might also contribute to this dichotomy, it is clear that the quality of AI research and AI patent activity in India is not in line with its global peers.

4. Lack of Cutting-Edge AI Infrastructure: India has established a national-level AI Research Analytics and Knowledge Dissemination Platform, which is designed to act as “a common cloud platform for Big Data Analytics with large AI computing infrastructure connecting all COREs, ICTAIs and other academic institutions with National Knowledge Network.”73 The National Supercomputing Mission, as per government data, has a total compute capacity of 24.83 petaflops, with a target of reaching 66 petaflops by 2025.74

Further, in March 2024, the Indian government had allocated Rs. 4,564 crore (approximately $544 million) under the IndiaAI mission to procure 10,000 GPUs through a public-private partnership model.75 In February 2025, Indian IT Minister Ashwini Vaishnaw, recognizing the demands for advanced computation capacity for AI research and development, announced that the Indian government had already procured 10,000 GPUs but would enhance the available capacity to over 18,000 GPUs.76 These GPUs, expected to comprise 12,896 Nvidia H100s and 1,480 Nvidia H200s, would be made available to start-ups as well as researchers at academic institutions.77

The constraints placed on advanced GPU imports by certain countries by the U.S. administration in January 2025 might, however, delay future acquisitions by India.78

Despite these acquisitions, Indian start-ups and the newly established CoEs will be lagging behind their global counterparts in the United States and China severely, especially since governments and tech firms elsewhere are also prioritizing such compute infrastructure and GPU access, including for research and development.79

5. Inadequate Resourcing of CoEs: So far, India has established three CoEs focused on AI. As per the recent budget for FY 2025–2026, it has also announced the establishment of a fourth CoE specifically for AI education.80 Appointing private sector–led committees to monitor these CoEs should allow them to remain deeply integrated with industry requirements. But these centers will also need to be resourced adequately. For the three initially announced CoEs, Rs. 990 crore (approximately $110 million) had been allocated over a period of five years, which comes to roughly $7 million a year per center.81 Although an additional Rs. 500 crore (approximately $57 million) has been set aside for the fourth CoE, inadequate resourcing will prevent these institutions from attracting the best AI talent, affording sufficient compute capacity, and buying the necessary datasets to conduct cutting-edge research. These fundamental issues will need to be solved in partnership with the private sector.

Key Recommendations

1. Public-Private Push for Greater AI R&D Spending: The government must incentivize the private sector firms (including IT services majors, conglomerates, and others) to invest in AI R&D, either through building in-house AI R&D centers or through the AI research parks at premier universities in India as well as abroad. Indian IT services giants, as well as the large conglomerates, with their healthy balance sheets, are well-positioned both to do the R&D and also benefit from it.

2. Industry-Academia Partnerships: India should also encourage global tech firms to create industry-academia partnerships in India. For example, Nokia, through its new 6G Lab in Bengaluru, recently partnered with the Indian Institute of Science (IISc) to jointly conduct research in 6G radio, architecture, and AI/ML technologies that have a particular relevance to the Indian and global markets.82 Such a template of industry-academia partnership should be replicated across at least fifteen to twenty technology companies and research institutions, possibly mapped to India’s priority sectors and technological strengths. The government, on its part, should focus on defining functional needs, interface productively with industry, and create processes and policies to support research and development for innovation as well as fast adoption.

3. Identify Clear Focus Areas: Given comparatively low R&D spending, developing and charting a clear research strategy along with focus areas will help efficient use of existing R&D expenditure. As per the commerce ministry’s AI Taskforce report and the NITI Aayog Strategy Paper, the specific five to ten domains where AI might find the most applicability and social benefit have already been identified.83 Working together with academia, researchers, and industry, the government must play a critical role in kick-starting the AI research ecosystem.

An analysis of the split of India’s AI patents and papers so far would also help identify existing strengths and weaknesses. To begin with, India must focus on research areas where it has a competitive advantage or specific needs, such as personalized and precision medicine, gene therapy, vaccine discovery, drug design, and cancer screening, or optimized crop management.

4. Building the Research Ecosystem: The government’s recent efforts, such as the “One Nation One Subscription” scheme, which provides Indian researchers with free access to the world’s top journals, and the Partnerships for Accelerated Innovation and Research (PAIR) initiative, which functions as a hub-and-spoke model to pair India’s top research institutions with others, are both commendable initiatives to boost the research ecosystem.84

Identification of talent, supporting that talent, and building ecosystems that become pockets or centers of excellence will be key. If needed, India must work on bringing some of the Indian-origin AI scientists working abroad to India and providing them with the necessary budgets and enabling ecosystems such that they can support India’s research objectives.

For this, India will also need to evolve better-structured incentives so that researchers can more easily commercialize their research. Like their counterparts elsewhere in the world, India must encourage researchers and faculty members at its premier research institutes to launch start-ups and be stakeholders in the valuable companies that come out of that research.

5. Empower AI Research Parks and CoEs to Become Globally Competitive: The AI parks that have been set up across the country, including at IIT Bombay and IISc, must be empowered with autonomy, talent, funding, and a strong intellectual property regime to really expedite the creation of dynamic AI research ecosystems. India will also need to pour much greater funding into cutting-edge, next-generation AI research in these institutions to be globally competitive. China, for example, had already set up more than sixty AI tech parks by 2018, which were providing financial incentives to attract AI companies. In addition, Beijing had also announced the setting up of a $2.1 billion AI tech park, while another province, Tianjin, announced plans to establish a $16 billion AI fund.85 This is, of course, in addition to the private investment being poured into AI by China’s tech and industrial firms.86

The ANRF set up by India is an excellent step forward and is launching various initiatives to make India’s researchers globally competitive. But giving India’s research institutions and parks autonomy, along with the ability to raise funds from other sources, easily commercialize their research, and launch start-ups based on that research, will be equally important.

6. Leverage the Existing R&D Centers and GCCs Set Up by Multinational Corporations in India: India has already emerged as a major hub of R&D activity for many global corporations that have set up R&D centers and GCCs in India.87 A key focus area for many of these centers is AI and ML. A recent Zinnov-NASSCOM report suggested that India has over 1,700 GCCs that employ over 1.9 million people and generated over $64 billion in revenues in FY 2024. Out of this, their revenue for engineering R&D stood at $36 billion.88

India must systematically work on leveraging the top AI- and ML-focused R&D talent currently housed within these captive R&D centers. They must be incentivized to branch out, raise funding, build innovative start-ups in the AI space, and conduct research in partnership with industry through the AI research parks.

7. International Collaboration: Through collaboration with the United States, Europe, Australia, Japan, and other friendly nations, India must protect against the unethical use of AI, help evolve a global consensus on the guardrails for AI development, as well as prioritize more open-source AI development. At the same time, to enhance its own capacities as well, India should push for international cooperation on AI R&D through joint research with various nations.89 To bolster joint authorship of cutting-edge AI research, Indian AI researchers must be given greater incentives and funding along with stronger institutional support for conducting collaborative research with non-Indian AI researchers.

Setting up AI R&D exchange programs between Indian research universities and global ones, sponsoring international AI fellowships for emerging AI researchers, and other bilateral and multilateral partnerships to foster the exchange of ideas and expertise in AI R&D can also help bolster the Indian AI research workforce. Some emerging examples of the latter include the declaration of the United States and the United Kingdom on cooperation in AI R&D and the Quad countries’ commitment to establish working groups on AI standards development and foundational research.90

Conclusion: Balancing India’s Competitiveness in AI With “AI for All”

India’s AI approach and competitiveness strategy have to be crafted within a challenging global context, one in which the factors of AI production are concentrated in the hands of a few countries and a few firms. The incumbent tech powers possess substantial advantages in hardware, data, algorithms, software, researchers, and capital. In such a scenario, Indian start-ups and enterprises admittedly lack access to a level playing field to compete in the foundational AI space in the short term.

The first-mover advantage of the United States and China on the one hand and Big Tech firms on the other does appear daunting. India must therefore build on its strong foundations in AI domestically to boost its ability to compete on the global stage. In addition to the existing emphasis on AI compute and hardware, this paper has argued that India must focus on solving the fundamental problems it faces in building out the talent, data, and research ecosystems in AI.

Without plugging these gaps, there could remain a big chasm between India’s ambitions and the capabilities of its researchers, entrepreneurs, and businesses to lead in AI. In many ways, China’s clear emphasis on building strong talent and research ecosystems in AI has contributed to its recent success with DeepSeek’s R1 model challenging the dominance of leading American ones.91 Solving these missing pieces in India’s AI puzzle, therefore, will be similarly critical for boosting India’s AI competitiveness.

Along with these efforts to boost India’s AI competitiveness, India must continue its efforts to create a level playing field in AI globally. India has already attempted to use its term as president of the Global Partnership on Artificial Intelligence to bridge the increasingly evident divide between the Global North and South in AI development and adoption.92 It should continue to build a strong voice for the Global South in AI and work hard along with other similarly placed nations to prevent a Second Great Divergence between the AI-haves and AI-have-nots.93

In addition, as it has done with DPI, India must build global coalitions to extend the DPI approach to the global AI landscape. This could be on the AI cloud infrastructure front through initiatives such as the Open Cloud Compute or the data front to build a global data commons.94 The extension of India’s DPI approach to AI would be a significant contribution toward global AI governance frameworks. More research is needed to suitably craft this extension.

India must also upgrade its capacity to engage in AI standards- and principles-setting processes with organizations globally, including the International Electrotechnical Commission and the International Organization for Standardization, among others.95 Collaborating with Europe, the Middle East, and other nations on forging a coalition around open-source standards for AI could be an example of a specific AI-focused coalition that India could push. Such multi-stakeholder coalitions of like-minded nations and companies that prioritize open-source development of AI could enable greater collaboration and the development of broad-based AI innovation ecosystems.

Ultimately, balancing its existing “AI for All” approach, both at the domestic level and on the global stage, with a “Competitiveness in AI” approach, as laid out in this paper, will be essential for India to achieve a leadership position in the highly competitive global AI ecosystem.

Acknowledgments                                                                                

The author wishes to acknowledge the contributions, support, and feedback of various colleagues at Carnegie India and Carnegie globally, including Rudra Chaudhuri, Anirudh Burman, Matt Sheehan, Milan Vaishnav, and the Carnegie editorial team. The author is also grateful for the valuable inputs of various stakeholders from government, research institutions, start-ups, venture capital firms, think tanks, and industry in India, the United States, Europe, and the Middle East.

Notes

Carnegie does not take institutional positions on public policy issues; the views represented herein are those of the author(s) and do not necessarily reflect the views of Carnegie, its staff, or its trustees.