Artificial Intelligence (AI) and large datasets are closely intertwined, as AI algorithms rely on extensive data to learn, analyse patterns, and make accurate predictions. India has been progressively exploring the use of data for enhanced governance, with the advent of digital public infrastructure supporting services like Aadhaar and Unified Payment Interface (UPI). These services have generated diverse datasets that capture the demographic richness of India. Hence, it is intriguing to consider enabling AI to extract insights from these extensive datasets.
The size and organisation of the Aadhaar database make it an optimal dataset for training AI, but this may violate Indians’ right to privacy.
Since its inception in 2011, over
1.3 billion people have been registered through Aadhaar. The size and organisation of the Aadhaar database make it an optimal dataset for training AI, but this may violate Indians’ right to privacy.
Allowing AI to be trained on datasets like Aadhaar raises evident concerns. The centralised nature of the Aadhaar database amplifies privacy concerns, as it becomes a potential target for malicious actors seeking to compromise sensitive data. Moreover, the replication of existing biases in AI algorithms poses a risk of inadvertently targeting marginalised groups and impeding social progress. To navigate these challenges, India could consider establishing basic rights for AI to balance its power and protect citizen privacy. Regardless, it is crucial to delve deeper into these issues and undertake a comprehensive exploration.
Proposed applications
Shifting the focus towards the potential advantages, the Aadhaar database would be a
uniquely valuable resource for national and state governments looking for trends across the Indian population. While the relationship between dataset size and AI accuracy is not always linear, it is generally observed that larger datasets tend to contribute to improved accuracy. Notably, the Aadhaar database encompasses over a billion entries, presenting a substantial resource for potential AI training. Aadhaar is also the result of deliberate government enrolment with clear categories and formats. Aadhaar's uniform and well-structured nature simplifies the practice of data analysis, unlike most datasets that are cobbled together from various sources.
The Aadhaar database encompasses over a billion entries, presenting a substantial resource for potential AI training.
Certain uses for AI trained on the Aadhaar dataset have already been proposed. National and state governments could use Aadhaar to track the distribution of welfare benefits and send resources to places that need them the most. In Punjab, police would like to integrate their existing
AI facial recognition software with Aadhaar to cast a wider net for apprehending criminals. Whatever the use case, the benefits of these potential programmes must be weighed against the inherent intrusion in people’s personal lives.
Right to privacy
When creating the Aadhaar system, the Indian government had to consider the impact of universal identity registration on the right to privacy enshrined in the Indian Constitution. In the 2017 case
Justice K.S. Puttaswamy vs. Union of India, the Indian Supreme Court upheld that privacy is a
fundamental right because it is a precondition for exercising individual freedom. In unanimous concurring opinions, the justices stated that as technological developments make the lives of private citizens increasingly susceptible to surveillance, it is more important than ever to
balance the right to privacy with governmental interests.
Allowing AI models access to the Aadhaar database without first constructing robust privacy protections fails to strike this balance. Training machine learning models on basic demographic details like name, age, sex, and residence would already constitute an ethically questionable infringement. These concerns amplify further as Aadhaar numbers are also linked with especially sensitive personal details like
bank accounts and SIM cards. Furthermore, none of the people currently registered with Aadhaar consented to their data being used for AI purposes at the time of enrolment, and given the
extent to which Aadhaar is used in everyday life, it would be difficult for citizens concerned about their data privacy to realistically remove themselves from the dataset. The ability of the Indian government to guarantee the privacy of its citizens enrolled in Aadhaar is further impaired by the
weak cybersecurity of the Aadhaar database.
In unanimous concurring opinions, the justices stated that as technological developments make the lives of private citizens increasingly susceptible to surveillance, it is more important than ever to balance the right to privacy with governmental interests.
Susceptibility to bias
Although computers are often assumed to be unbiased, AI systems are commonly known to
replicate existing biases in datasets. Machine learning relies on past data to make future predictions, but training models on biased data can lead to issues in reproducing those biases. For example, predictive policing programmes in the United States (US) often
unfairly target communities of colour. African American people in the US are more likely to be reported to the police, so AI models have been known to predict that African American communities are higher-risk areas regardless of the actual crime rate in the area. Similar bias could be expected to occur based on historical divisions in India.
Integrating Aadhaar and AI poses complex challenges, including privacy, security, and the inability to identify and address biases in machine learning programmes trained on Aadhaar data. AI integration with sensitive data must consider the potential for discrimination based on caste, religion, or income. Checks and balances are essential to prevent AI from perpetuating inequality. Machine learning's
'black box' approach, where the logic behind decisions is unknown even to its creators, exacerbates this issue. In 2019, Apple Inc. could not explain why its new credit card algorithm offered
higher lines of credit to men than women, and even Google’s own engineers do not know the specific aspects of a website its
algorithm uses to display search results.
Security concerns
Protecting the integrity of datasets, such as the Aadhaar database, is essential for safeguarding citizens' privacy from hackers and breaches. Securing the vast amount of data linked to Aadhaar has already proven to be difficult because of the extremely centralised nature of the database: Aadhaar details have been
leaked to the public multiple times in recent years, and in 2018,
The Tribune was able to pay a group on WhatsApp INR 500 to
access personal information via Aadhaar number. These problems arose before AI integration was considered, and the biggest actor privy to the inner working of Aadhaar was the Indian government.
Machine learning relies on past data to make future predictions, but training models on biased data can lead to issues in reproducing those biases.
Introducing AI into this process would further exacerbate concerns about keeping Aadhaar information safe. To develop integrated AI technologies, the Indian government often relies on
contracts with third-party private companies. Granting other companies access to the Aadhaar database for AI training expands access points, heightening the risk of third-party infiltration and compromising the entire system's security. The US faced a similar issue in 2020 when cybersecurity measures on federal systems were
undermined by a breach at a small contractor with access to the network. Even if existing problems with the Aadhaar system are fixed, the Indian government would still need a procedure for auditing potential contractors to ensure that no weaknesses are introduced to the system.
Preserving privacy in the face of AI
To balance AI benefits and privacy concerns, India should draft basic rights for AI, providing guidelines for responsible data use. However, Indian regulators need to address ethical AI, particularly regarding Aadhaar. While the government has endorsed the concerns raised by the
National Strategy for AI report published by NITI Aayog in 2018 and has stated that AI actors will likely be beholden to the data privacy regulations as laid out in the
upcoming Digital Personal Data Protection Bill, the Ministry of Electronics and Information Technology has stated that it has
no plans to create specific regulation for AI because it believes that overregulation will stifle innovation. While the instinct to encourage AI growth is understandable, letting the industry go completely unregulated risks dire ethical violations. Instead, India should consider following the examples set by international partners to regulate AI without strangling the nascent technology.
In 2019, the G20 adopted
principles for responsible stewardship of trustworthy AI, which direct AI actors to promote inclusivity, transparency, security, and accountability. The
European Union (EU) and the US have also proposed their own guidelines for the ethical use of AI. Expanding on these broad principles, India could produce national guidelines tailored to the Indian interest in AI, particularly for Aadhaar integration. AI undoubtedly has endless potential to improve the well-being of Indian citizens. The government must examine possible AI regulations to responsibly harness the power of AI and use emerging technologies to build a better future.
Jenna Stephenson is an intern with the Geoeconomics Programme at the Observer Research Foundation
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.