Balancing India’s push for data localisation with the demands of AI innovation is critical to safeguarding privacy while ensuring that the country remains globally competitive in the digital economy.
Image Source: Getty Images
The explosive growth of the Indian digital economy has led to a surge in data creation, altering how citizens interact with technology. As India strengthens data management and protection through measures like the Digital Personal Data Protection Act (DPDP Act, 2023), there is a likelihood that data localisation will be pursued to enhance sovereignty and security. However, the notion of storing data within national borders creates tension with the development of Artificial Intelligence (AI) systems, which rely on vast, diverse datasets. This interplay between restricting data flows and enabling AI innovation raises a critical question: how can India balance privacy, data availability, and global competitiveness in the era of AI?
Globally, countries have taken different approaches to balance privacy and AI. The European Union’s (EU) General Data Protection Regulation (GDPR), its data privacy regulation, combined with the AI Act, presents the most restrictive approach to AI and data governance. The GDPR applies to the AI Act in cases where systems are handling any form of public data. Nearly 90 percent of AI systems in the EU have been classified as ‘high risk systems,’ meaning additional security measures and privacy steps have been taken to protect consumer rights. Such firms also have to undergo additional Data Protection Impact Assessments (DPIAs) to reduce risks to consumer data. The GDPR treats re-identifiable anonymised data as personal data, limiting its use.
In contrast, the US has established an innovation-first regulatory approach. There is no comprehensive federal law specifically addressing both data privacy and AI. A patchwork of state laws and federal regulations, along with ongoing efforts to establish national standards, regulates the environment. State- and sector-specific laws manage data standards for various policy areas.
Consequently, US companies have easier access to training data, contributing to US dominance in foundation models and Large Language Models (LLMs). However, this approach raises concerns about privacy rights and algorithmic bias that India seeks to avoid. In comparison, the EU's experience reveals stronger privacy protection but slower AI innovation compared to the USA's less regulated market.
Data localisation laws are part of a greater global trend where countries attempt to secure their own data. It is motivated by concerns regarding national security and ease of law enforcement — by ensuring easy domestic access to crucial information while simultaneously limiting foreign access.
The DPDP Act was India’s first comprehensive data protection law, giving data principals — individuals whose data is being processed online — the right to access, correct, and erase their data, as well as seek grievance redressal. Conversely, data fiduciaries — entities that process data — shoulder the responsibility to ensure data accuracy, implement adequate security measures, and notify individuals in case of a security breach.
Although fully approved, the act is being implemented in phases. The draft rules of the act were released in January 2025, with some provisions pending final notification and compliance requirements expected to have a two-year window. These rules are still under public consultation, and implementation timelines and compliance obligations are yet to be finalised.
One of the DPDP’s key provisions is the potential requirement to store data within India. While the Act originally allowed cross-border transfers except to blacklisted countries, the draft rules may impose stricter requirements on Significant Data Fiduciaries, mandating that sensitive data be processed only in India. Sector-specific laws/rules like the Indian Companies Act, 2013 (requiring records to remain at registered offices), the Reserve Bank of India's Directive 2017-18/153 (on storing payment data domestically), and the IRDAI (Maintenance of Insurance Records) Regulation, 2015 (mandating keeping insurance data within India) push for similar data localisation. However, the draft rules act independently of them, calling for stricter localisation mandates when necessary.
Data localisation laws are part of a greater global trend where countries attempt to secure their own data. It is motivated by concerns regarding national security and ease of law enforcement — by ensuring easy domestic access to crucial information while simultaneously limiting foreign access. Simultaneously, data localisation requires sophisticated infrastructure, which may accompany beneficial economic impacts. Additionally, nationally stored data would be easier to access for research, innovation, and the development of new technologies.
The United States Trade Representative has argued that India’s data localisation push raises significant trade barriers and mandates the construction of redundant data infrastructure. Additionally, firms representing big tech companies such as Amazon, American Express, and Microsoft have also lobbied against this effort.
That said, any push toward data localisation increases compliance costs, requiring businesses to adapt or build new infrastructure, which may reduce service availability where infrastructure costs outweigh profitability. The increased cost of building infrastructure on a country-wide basis implies that organisations will only continue to operate where it is profitable to build infrastructure. In the long run, this could result in a decrease in the quantity of services supplied. As all integral data becomes localised, threats such as cyber attacks and natural disasters render systems more vulnerable. Large organisations that operate businesses in India have argued against the country’s push for data localisation. The United States Trade Representative has argued that India’s data localisation push raises significant trade barriers and mandates the construction of redundant data infrastructure. Additionally, firms representing big tech companies such as Amazon, American Express, and Microsoft have also lobbied against this effort.
Beyond economic implications, India’s data localisation push directly affects the technical foundation of AI systems themselves. LLMs and other artificial intelligence systems are designed to thrive on large inputs of data. They create statistical associations across billions of data points to understand and learn complex patterns across text, images, and other inputs. The foundation models of these systems must be trained at scale, with diverse datasets to ensure unbiased high-performance results that can be generalised over vast contexts, and address the needs of multiple populations simultaneously. India’s ambition to become a pioneer in responsible AI is evident through its policy frameworks. NITI Aayog’s Responsible AI paper outlines steps to encourage research in and development of AI technologies that promote diversity and ethicality. India’s National AI Strategy aims to leverage AI across healthcare, agriculture, education, and urban planning. With an INR10,000 crore outlay and institutional leadership shifting to the Ministry of Electronics and Information Technology (MeitY), the IndiaAI Mission marks a major policy transition. The mission is being implemented by IndiaAI, an Independent Business Division under MeitY that aims to make India a global hub for AI development under the dual goals of ‘Make AI in India’ and ‘Make AI Work for India.’ However, this vision hinges on access to large datasets — fuel for AI systems but also a source of significant privacy and ethical risks.
Strict localisation requirements might discourage AI companies from conducting research and development in India. Country-specific standards often hinder meaningful cross-country comparisons and benchmarking.
The conflict between data localisation and artificial intelligence creates several critical challenges for responsible AI development. Limiting data to national boundaries decreases the diversity of the training dataset, leading to algorithmic biases where AI models can inherit and amplify existing societal misconceptions. Constrained datasets limit the AI’s ability to generalise and capture edge cases or rare events, which are required for tasks needing broader contextual understanding, such as those in global supply chains. The redundancy of cross-border cloud computing infrastructure, the basis of many AI systems, forces companies to build expensive in-country servers, raising costs and stifling smaller AI startups. Therefore, strict localisation requirements might discourage AI companies from conducting research and development in India. Country-specific standards often hinder meaningful cross-country comparisons and benchmarking. Additionally, the centralisation of large amounts of data in one location could create high-value targets for cyberattacks, increasing the risk of data misuse. Therefore, India risks creating regional data silos that reduce the diversity needed for generalisable, ethical, and globally competitive AI systems.
AI startups in India now face the challenge of accessing language, health, and consumer data while following legal obligations set by the government. Since the DPDP lacks explicit rules regulating anonymised data, this creates a grey governance area. However, there are no international standards regulating the deanonymisation effort. Although anonymisation techniques such as removing or masking direct identifiers are intended to protect individuals’ privacy, linkage attacks that cross-reference publicly available records can re-identify individuals. This issue is particularly potent for Indian AI startups working in healthcare and language technologies, where datasets often correlate closely with personal information. Indian firms face the same questions as global leaders in AI standards: Can de-anonymised data still be considered personal data?
Standing at a unique inflexion point, India faces the challenge of leveraging its large datasets, unique talent, and evolving regulatory maturity while balancing competing priorities. Regulatory sandboxes would allow for controlled environments in which AI companies could experiment with data usage under relaxed DPDP requirements.
Standing at a unique inflexion point, India faces the challenge of leveraging its large datasets, unique talent, and evolving regulatory maturity while balancing competing priorities. Regulatory sandboxes would allow for controlled environments in which AI companies could experiment with data usage under relaxed DPDP requirements. These sandboxes could foster innovation through insights into practical implementation challenges. Through standardised anonymisation protocols that provide technical and ethical guidelines, the government could provide legal certainty for AI developers. A data trust model through which independent authorities manage data voluntarily pooled by data principals would balance public interest, privacy rights, and innovation needs. Combining India’s diverse datasets with its growing technical talent pool and maturing regulatory environment could position the country as a leader in responsible and inclusive AI technologies.
Tara Chawla is a Research Intern at the Observer Research Foundation.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.