GenomeIndia highlights the need to ground precision medicine in India’s genetic diversity, linking population-specific data with targeted prevention and more equitable healthcare outcomes
The discovery that DNA is the genetic blueprint for life emerged over 70 years ago. This has since fueled exploration into the genetic basis of human disease and the development of personalised treatment strategies. Despite being one of the most genetically diverse populations in the world, India remains underrepresented in genetic studies, limiting the application of genomics to human medicine. New insights from the GenomeIndia Project—a Department of Biotechnology-funded initiative—highlight that both socio-cultural factors, such as ethnolinguistic diversity, and genetic factors shape health outcomes. Expanding these efforts through broader data generation will enhance access to genomic technologies.
The GenomeIndia Project was launched by the Department of Biotechnology (DBT) in 2020 as a pan-India endeavour to identify the genetic diversity of the Indian population. A consortium of 20 national institutes, including the Council of Scientific and Industrial Research–Institute of Genomics and Integrative Biology (CSIR–IGIB), Delhi; Centre for Brain Research (CBR), Bangalore; CSIR–Centre for Cellular and Molecular Biology (CCMB), Hyderabad; and Biotechnology Research and Innovation Council–National Institute of Biomedical Genomics (BRIC–NIBMG), Kalyani, working on this project. As of 2025, around 20,000 samples from unrelated healthy Indians have been collected, and 9,768 have undergone genotyping. This includes 83 population groups across India, comprising 32 tribal groups and 53 non-tribal groups, out of 4,600 distinct population groups. The study has taken representation across four main linguistic families, including Indo-European, Dravidian, Austro-Asiatic, and Tibeto-Burman.
The collected information will be available to researchers in India and globally through the Indian Biological Data Centre Portal (IDBC). Its use and sharing will be governed by guidelines, including Biotech-PRIDE (Promotion of Research and Innovation through Data Exchange) and the Framework for Exchange of Data (FeED), that facilitate secure genomic data sharing.
A core objective of the project is to build a catalogue comprising genetic data, phenotypic characteristics such as biochemical parameters, and socio-demographic information. The collected information will be available to researchers in India and globally through the Indian Biological Data Centre Portal (IDBC). Its use and sharing will be governed by guidelines, including Biotech-PRIDE (Promotion of Research and Innovation through Data Exchange) and the Framework for Exchange of Data (FeED), that facilitate secure genomic data sharing.
The most recent findings from GenomeIndia demonstrate that the risk of metabolic disease varies across ethnolinguistic groups. For instance, higher levels of high-density lipoprotein (HDL)—a protein associated with the transport of lipids such as cholesterol—typically observed among women relative to men across age groups are absent in Indian tribal populations. This difference may be attributed to genetic, lifestyle, or environmental factors, which state-level surveys do not adequately capture, necessitating their incorporation to enable targeted healthcare and lifestyle interventions. Such findings indicate that preventive strategies cannot be applied uniformly and should instead be tailored to specific population groups.
A study of Korean adults found that lifestyle factors significantly reduced the risk of metabolic disease, despite genetic predisposition. From an Indian perspective, this suggests that as GenomeIndia identifies population-specific disease risk patterns, genomic insights can be combined with targeted lifestyle interventions and screening strategies. As the largest population-level genetic study of Indians to identify genetic variation within the population, GenomeIndia marks a significant step forward in sequencing efforts.
Figure 1: Key Steps Involved in the GenomeIndia Project

Source: Oxford Academic; Human Molecular Genetics (2025)
India’s genetic diversity is long and complex. A study carried out by the University of California, Berkeley; the All India Institute of Medical Sciences (AIIMS), Delhi; the University of Southern California (USC); and the University of Michigan through the Longitudinal Aging Study in India-Diagnostic Assessment of Dementia (LASI-DAD)—a population-based study of individuals aged 60 years or older—demonstrated that Indian ancestry can be traced to a migration event about 50,000 years ago out of Africa, followed later by movements of communities from Central Asia.
India’s population history is important to healthcare because founder effects—a kind of genetic drift or variation that occurs when a segment of the population breaks away (usually geographically) to form a new population—and cultural practices like endogamy, or marrying within specific social groups, have increased the frequency of certain genetic variants. This can raise the incidence of certain inherited disorders, making population-specific genomic datasets essential for targeted screening and earlier diagnosis. For instance, a mutant form of the butyrylcholinesterase (BCHE) gene, which causes muscle paralysis due to an inability to metabolise certain anaesthetic drugs, is highly prevalent in certain communities, such as the Vysya community in Andhra Pradesh and Telangana, but not found at high frequencies in other populations in India. This demonstrates that population-specific genomics knowledge is significant in clinical care.
Genome-wide association studies (GWAS)—a research methodology used to identify genetic variations associated with specific diseases—have been dominated by participants of European descent, with limited representation from South Asia and Africa.
Population-level genomic studies aimed at improving disease understanding and advancing personalised medicine lack diversity. Genome-wide association studies (GWAS)—a research methodology used to identify genetic variations associated with specific diseases—have been dominated by participants of European descent, with limited representation from South Asia and Africa. According to the NHGRI-EBI GWAS Catalogue, a publicly available human GWAS database, samples from individuals of South Asian ancestry account for 0.9 percent, African ancestry 1.1 percent, and East Asian ancestry 5.9 percent, compared to 86.3 percent European ancestry. Genetic diversity varies significantly across populations, yet global efforts to catalogue this variation have failed to capture its full extent.
The World Health Organization (WHO)’s Science Council has emphasised the utility of genetic information for public health technologies; it will enable an understanding of the genetic basis of certain diseases and can provide information on an individual’s resistance, susceptibility, and response to disease. This will enable a deviation from a ‘one-size-fits-all’ approach to human medicine towards a more personalised treatment strategy. From a healthcare perspective, underrepresentation of Indian populations in global databases can affect the accuracy of diagnostics, drug metabolism, and therapies. Pharmacogenomics databases like the Pharmacogenetics Knowledge Base (PharmGKB), which provide information on drug-gene interactions, consist of more than 60 percent samples from individuals of European descent. Clinically, understanding drug metabolism across diverse populations can provide valuable information to help improve health outcomes. For instance, a study examining the impact of psychiatric medications on drug metabolism in Indian patients highlighted the need for comprehensive genetic screening so that clinicians can prescribe medications better suited to their genetic makeup. This shows how genomics can improve clinical decision-making, reduce adverse drug reactions, and improve treatment efficacy.
A study examining the impact of psychiatric medications on drug metabolism in Indian patients highlighted the need for comprehensive genetic screening so that clinicians can prescribe medications better suited to their genetic makeup. This shows how genomics can improve clinical decision-making, reduce adverse drug reactions, and improve treatment efficacy.
Genomics is invaluable for cell and gene therapies, including sickle cell disease (SCD). In 2023, India contributed to 14.5 per cent of the global SCD burden. Consanguinity or marriages between close relatives is a major factor that continues to drive the persistence of high prevalence of SCD in Indian populations. Most studies have highlighted a high prevalence of SCD amongst tribal communities; however, recent studies, including one carried out in Chamarajanagar, Karnataka, have demonstrated that non-tribal groups also show a high prevalence of SCD. Birsa-101—a CRISPR-based gene therapy—is an example of a new therapy for SCD, in which gene-editing technology corrects mutations in the defective HBB (haemoglobin subunit B) gene in SCD patients. In Saudi Arabia, studies have aimed to determine suitable drug candidates for the treatment of SCD, where the prevalence is 27 percent. Collectively, these examples show how genomic data are used to guide both curative therapies and population-specific treatment strategies for a genetic disease.
Increasing the genomic diversity of datasets would improve understanding of the genetic basis of disease, enhance risk mapping, and support the development of personalised therapeutic targets, thereby promoting equity in genomic healthcare. DNA sequencing platforms, such as next-generation sequencing (NGS), have advanced technologically, leading to improvements in accuracy, reduced costs, and opportunities for population-scale studies. The use of computational methods to study biological data and generate clinical insights has been driven by bioinformatics and has expanded significantly. Artificial intelligence (AI) tools can distinguish between normal and disease-causing genetic variants, predict disease likelihood, and reduce the time required for manual analysis. These advances can accelerate diagnosis, improve disease risk prediction, and make precision medicine more scalable.
Collectively, advances in DNA sequencing, bioinformatics, and AI, combined with national efforts like the GenomeIndia project, can transform genomics from a research domain into a scalable healthcare capability. Countries that invest in genomic capacity will be better placed to deliver affordable and personalised healthcare to their population. For India, this path begins with developing an understanding of the genetic diversity of its own people.
Lakshmy Ramakrishnan is an Associate Fellow with the Centre for New Economic Diplomacy at the Observer Research Foundation.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.
Lakshmy is an Associate Fellow with ORF’s Centre for New Economic Diplomacy. Her work focuses on the intersection of biotechnology, health, and international relations, with a ...
Read More +