India stands at a pivotal point in its attempt to enter the generative AI race. By backing domestic compute capacity, coupled with indigenous models and data, India can forge a leadership position in AI rooted in inclusivity and digital sovereignty.
Image Source: Getty Images
Generative Artificial Intelligence (GenAI) is a subset of Artificial Intelligence (AI) that enables machines to generate realistic, human-like content. The machine encodes the input into machine-readable forms and maps it to the most closely related patterns in the databases it has been trained on. The final output is a contextually appropriate response based on the input prompt. As GenAI technologies reshape industries and human interaction globally, India is gradually carving its place in this evolving landscape.
With more than 700 million internet users and an estimated 600,000 AI professionals, India’s AI sector is super scalable. The growth of a robust technological infrastructure has played a key role in the emergence of about 2,915 AI startups. There is a surge in the popularity of Large Language Models (LLMs), with ChatGPT and Gemini gaining ground in India. Indians currently account for 13.5 percent of global users of ChatGPT, making India its largest user base. DeepSeek, the Chinese open-source model, has been gaining market share due to its cost advantages and compute efficiency. India is currently DeepSeek’s fourth-largest customer base worldwide, with approximately 43.36 million website visits as per a report published in February 2025.
The main issue with foreign LLMs is the inherent bias in their responses. For example, DeepSeek exhibits significant pro-China bias when queried on geopolitical and historical issues ranging from Kashmir to Arunachal Pradesh, as well as the Tiananmen Square incident.
However, despite seemingly high adoption rates, a recent study found that only 31 percent of Indians have used GenAI platforms. This could be due to the inability of foreign models to cater effectively to India’s multilingual population, socio-cultural diversity, and contextual realities, which often results in culturally inaccurate outputs and hampers last-mile adoption due to reliance on non-local datasets. For instance, MetaAI generated a ‘man with a turban’ four times out of five when asked to generate an image of an Indian, despite India’s demographic and cultural diversity. Moreover, low formal and digital literacy create additional barriers to adoption, further deepening the trust deficit caused by this cultural mismatch.
The main issue with foreign LLMs is the inherent bias in their responses. For example, DeepSeek exhibits significant pro-China bias when queried on geopolitical and historical issues ranging from Kashmir to Arunachal Pradesh, as well as the Tiananmen Square incident. These raise serious concerns for India, given the rapid proliferation of this open-source model.
Being the second largest generator of digital data globally, India has the potential to provide high-quality datasets for model training to make AI tools more accessible to underserved populations. The government’s dataset platform, AIKosh, is the starting point of a high-quality data capture initiative with institutions like IIT Bombay contributing over 16 datasets to the platform. It also complements the Bhashini initiative, an Indic translation tool that aims to overcome linguistic, digital, and literacy barriers in India.
Once fully deployed, Indic LLMs could offer what foreign LLMs currently cannot – accurate, unbiased communication in Indian languages.
Government support has generated strong momentum in India’s GenAI space. Once fully deployed, Indic LLMs could offer what foreign LLMs currently cannot – accurate, unbiased communication in Indian languages. However, much depends on successful execution, as the Indic GenAI sector navigates evolving compute supply chains, data governance issues, ethical alignment, and most importantly, talent retention in India.
Instead of remaining a passive recipient of foreign technological innovation, India must pursue sustainable capacity building on a larger scale. These foreign models are not trained to reflect Indian contexts – resulting in digital marginalisation of 1.4 billion voices. India must rethink its relationship with data. By considering data as an asset rather than a commodity, indigenous model training will ensure that value capture remains within borders. Data generated by Indians must be trained to develop indigenous models. This approach ensures that the economic, diplomatic and intellectual value of AI development remains within India. Moreover, this would set up new value chains - from research and development (R&D) to model training and deployment - rather than employing foreign LLMs for various services. This furthers India’s stance in technology diplomacy by becoming a leading technology voice from the Global South.
The IndiaAI mission, launched in 2024 with a budget of over INR10,000 crore, has been driving AI innovation. Within this initiative, the Ministry of Electronics and Information Technology (MeitY) has selected start-ups like Sarvam, Soket, Gnani, and Gan AI to build India’s indigenous LLM ecosystem. Sarvam will develop a 120 billion parameter, multi-scale foundational model - with its suite of models spanning capabilities like content generation, deep research, and compact on-device processing, enabling computation on mobile devices. Sarvam-M, a previous release with support for 10 Indic languages, was built for efficiency. It uses approximately 1.4-2.1 tokens per word, compared to the industry norm of 3+ for Indian languages, enhancing compute efficiency aligned to India’s needs. A blend of Supervised Fine Tuning and Reinforcement Learning with Verifiable Rewards (RLVR) allows the model to analyse and solve complex math and coding problems. This is achieved in a decoder-only system, with high speed and efficiency, courtesy of a Mistral-type hardware infrastructure.
The IndiaAI mission, launched in 2024 with a budget of over INR10,000 crore, has been driving AI innovation. Within this initiative, the Ministry of Electronics and Information Technology (MeitY) has selected start-ups like Sarvam, Soket, Gnani, and Gan AI to build India’s indigenous LLM ecosystem.
Complementarily, Soket will develop a 120 billion parameter open-source model optimised for India’s defence, healthcare, and education sectors. Gnani AI will build a 14 billion parameter voice model with fast speech processing, while Gan AI, a company that has previously provided tech solutions to companies such as Google and Amazon, is building a 70 billion parameter ‘superhuman’ text-to-speech model, capable of surpassing human intelligence. In addition to the MeitY selected start-ups, the Department of Science and Technology has also been supporting a multi-modal AI model, BharatGen, to boost public service delivery and citizen engagement.
To train indigenous models, the government launched the IndiaAI Dataset Platform, where diverse, anonymous and non-personal data is stored. AIKosh’s data models, like Hercule-HI and Hercule-BN, ensure translation of English into Hindi and Bangla for better alignment with human judgment in low-resource settings. These models incorporate dialectal nuance to improve translation accuracy. AIKosh aims to enable data discoverability and encourage innovation. Skill development programmes like ‘YuvAI’ and ‘Srijan’, in collaboration with premier institutes and technology companies like Meta, will offer young professionals much-needed exposure to realise their potential.
The prospect of developing a fully open-source and inclusive model could improve public trust in AI as a transformative technology, raising adoption rates of models like Sarvam, which have seen low uptake since launch.
Comprising 16 percent of the global AI workforce, India does not lack talent. The ‘brain drain’ must be looked at with serious concern, as 80 percent of premier Indian AI researchers move abroad to pursue their work, leaving behind significant gaps in knowledge and experience. LLMs not optimised for linguistic and cultural diversity can lead to misrepresentation of cultural iconography and vocabulary. India’s decentralisation across various sectors can prove a source of strength. The growing principle of centralised AI tools with foreign influences and datasets risks overlooking local nuance, limiting relevant and representative content generation.
The ‘brain drain’ must be looked at with serious concern, as 80 percent of premier Indian AI researchers move abroad to pursue their work, leaving behind significant gaps in knowledge and experience.
Developing and training LLMs requires significant compute capacity. India’s computing infrastructure was said to account for less than 2 percent of the global capacity in 2024. This is particularly stark given the US and China’s combined capacity of 58-59 percent. With compute capacity currently exceeding 34,000 GPUs, India is working on building up its domestic capacity. In parallel, given the high carbon footprint of training LLMs, there needs to be an added focus on the development of critical infrastructure like utilities that can sustain and help scale India’s efforts in AI. Therefore, for GenAI to lift off in India, it requires a combined focus on talent, data, and compute in addition to the sustainable provision of utilities like energy and power.
The government must focus on key challenges to AI adoption - access, affordability, and relevance to local needs. Building on its efforts with LLMs will pave the way for adoption at the last mile. Going forward, many local enterprises such as kiranas, clinics, and farms with diverse languages and needs could leverage Small Language Models – a cleaner and more optimised AI. This will allow AI tools like voice assistants and smart document readers to function seamlessly across sectors like healthcare, agriculture, and social welfare, encouraging faster adoption of AI at the grassroots level.
Artificial Intelligence is poised to become an integral part of daily life. Therefore, the models must be able to communicate effectively, without language and information barriers. India has a golden opportunity to move forward in the GenAI space with indigenous innovation that sets a new global paradigm – one that is inclusive, low-cost, and efficient. Indigenous models must not be treated as a secondary consideration, but rather as a strategic one.
Srijan Jha is a Research Intern with the Centre for Security, Strategy and Technology at the Observer Research Foundation.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.
Srijan Jha is a Research Intern with the Centre for Security, Strategy and Technology at the Observer Research Foundation. ...
Read More +