commentary

A Primer on Compute

This essay is the first of a three-part series that will seek to understand the considerations for building compute capacity and the different pathways to accessing compute in India. This essay delves into the meaning of compute and unpacks the various layers of the compute stack.

by Aadya Gupta and Adarsh Ranjan

Published on April 30, 2024

Introduction

Computational power, or compute, is a fundamental building block in the development and deployment of artificial intelligence (AI)-based solutions at scale. Compute enables AI models to process vast amounts of data, perform complex calculations, and make intelligent decisions by learning from this data. Compute is required to train models to learn patterns and logic, enable real-time decision-making, and ensure a smooth user experience by reducing latency.

In October 2023, the Indian government initiated a discussion regarding a proposal to set up 25,000 GPUs in the country, with the vision to make compute accessible to Indian companies engaged in the development and use of AI. During his address at the inauguration of the Global Partnership for Artificial Intelligence Summit in December 2023, Prime Minister Narendra Modi shared a vision to “establish adequate capacity of AI compute power in Bharat.” Subsequently, in March 2024, the cabinet approved a Rs. 10,372-crore outlay for the IndiaAI Mission, under which the government aims to establish compute capacity of at least 10,000 GPUs through the public-private partnership model. These developments have sparked debates within academic and policy circles on India’s approach to compute, and the role of the government and the private sector in providing access to compute.

Given the increasing focus on compute, there is a need for a cohesive understanding of the term in order to develop and inform policy and build a robust AI ecosystem in India.

The second essay in this series will outline the contours of the compute debate with a focus on India. This follows stakeholder discussions and interactions with technologists, government officials, academics, and startups. The third and final essay will explore how export controls and industrial policies shape the compute marketplace.

Unpacking the Term ‘Compute’

There is no widely agreed upon definition of compute. The term can be used interchangeably to refer to a metric of measurement, hardware, or a stack.

Compute as a metric of measurement refers to the number of floating-point operations per second (FLOPS or flops) or calculations that a processor can do in one second. The processing speed of computers is measured in petaflops, which are equal to a thousand trillion flops (1 petaflop = 10¹⁵ flops).

The world’s fastest supercomputer, Frontier at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, has a peak speed of 1,679.82 petaflops—meaning it can carry out up to 1,679 quadrillion calculations per second. By comparison, a MacBook Air with an M1 chip has a peak speed of 2.6 teraflops, meaning it can carry out up to 2.6 trillion calculations per second.

Compute as hardware refers to the chips that perform calculations, process data, and execute instructions. These chips, such as graphic processing units (GPUs), are designed to accelerate the rate at which calculations are performed.

Compute as a stack consists of three essential components that work together to provide the technical capabilities required to build AI applications—the hardware, the software, and the infrastructure. Within the stack, the hardware refers to chips, the software refers to the programming languages that enable the use of the specialized chips, and the infrastructure refers to data centers and the resources required to run them, such as cooling equipment, servers, and cables.

It is important to understand that the different components of the compute stack are intricately linked to each other. Each element has an impact on the efficiency of the others and should not be viewed in isolation.

Understanding the Hardware Layer of the Compute Stack

The hardware layer forms the backbone of the compute stack as it provides the raw capability that AI models utilize to process data and generate insights. Therefore, it is crucial to understand the different types of chips that power AI, their benefits, and their drawbacks.

Computers operate on a central processing unit (CPU)—a chip that enables a user to perform various functions on a computer, like checking emails, watching videos, and listening to music. It serves as the brain of the computer, managing the software, background processes, and other programs. However, CPUs are designed to carry out serial processing, that is, they can carry out one task at a time.

Along with the CPU, computers nowadays are also equipped with a specialized chip known as the graphic processing unit (GPU). GPUs contain thousands of arithmetic logic units, the component that performs computations that enable parallel processing. While they were originally developed to render graphics in gaming and animation, GPUs have evolved into more programmable and flexible tools and have found applications beyond graphics, notably in AI.

In addition to GPUs, there exist other specialized AI chips such as application-specific integrated circuits (ASICs) like Google’s Tensor Processing Units (TPUs), and field programmable gate arrays (FPGAs) like Intel’s Agilex. ASICs are custom designed for specific tasks or applications. In contrast, FPGAs are built in such a way that they can be modified even after manufacturing. FPGAs consist of configurable logic blocks that can be reprogrammed using programming languages such as Python, to suit an application’s specific needs.

The specialized nature of each of these chips enables them to deliver greater efficiency as they are more suited to AI workloads. For example, Google’s TPUs are specially built to handle matrix multiplication which are the fundamental calculations underlying AI models.

Each chip excels at specific stages of the development of an AI model, and the selection of a chip depends on the specific application they are required for. For example, ASICs and FPGAs are more efficient than GPUs when it comes to inference, though GPUs still hold primacy when it comes to training models using data.

Several studies have attempted to benchmark the relative performance of CPUs when compared to specialized AI chips and have found that specialized AI chips perform ten to a thousand times better in terms of speed and efficiency in deep learning models.

Understanding the Software Layer of the Compute Stack

The software layer of the compute stack acts as a bridge or an interface that allows a user to develop applications by utilizing the raw capability provided by the hardware. Through programming languages such as Python, the software layer enables the user to communicate instructions that are then executed by the hardware, and it provides an interface through which results can be monitored.

Software also plays a vital role in ensuring that the hardware is utilized to its full potential. For example, the transformer architecture that underlies generative AI models has parallel processing built into its design. This allows for the efficient utilization of hardware resources like GPUs or TPUs. Similarly, the compute unified device architecture (CUDA) is a computing platform built by NVIDIA that enables parallel processing in GPUs. Software that improves the efficiency of the hardware is critical for reducing the time it takes to create an application and brings down costs.

Understanding the Infrastructure Layer of the Compute Stack

The infrastructure layer comprises of data centers—physical locations that store computing hardware, data storage devices, and related network equipment. Data center infrastructure has three categories—computing, storage, and network.

The computing infrastructure houses the chips, memory, and servers. The storage infrastructure stores data. The network infrastructure includes cables, switches, and routers that connect the various components of the data center. They also connect the data center to end-user locations.

Data centers are critical to ensuring that applications are available to users at all times and are therefore equipped with uninterruptible power supplies. Since they are always “on,” data centers generate a lot of heat that can damage hardware, cause fires, and impact performance negatively. To combat potential overheating and subsequent malfunction, ventilation and cooling equipment are put in place. This also reduces operational costs as the hardware is protected from damage.

Conclusion

The interconnected nature of the various components of the compute stack underscores the importance of a holistic approach to understanding compute. While the hardware does play an important part, each element of the stack has an impact on the efficiency of the other and must therefore be considered when building compute capacity.

Carnegie India does not take institutional positions on public policy issues; the views represented herein are those of the author(s) and do not necessarily reflect the views of Carnegie India, its staff, or its trustees.