Expert Speak Health Express
Published on Apr 23, 2026

New GenomeIndia findings point to a more uneven landscape of disease risk, drug response, and underdiagnosed metabolic burden than India’s current medical averages can capture

What GenomeIndia Reveals About Health Across India’s Diverse Populations

Image Source: Getty Images

India has long figured prominently in discussions of global health by sheer demographic weight, yet in genomics and detailed phenotypic research, it has remained oddly underrepresented. The GenomeIndia (GI) project was conceived to correct that imbalance. Launched in January 2020, it was structured as a national consortium and supported by the Department of Biotechnology (DBT) under the Ministry of Science and Technology. The initiative brings together a wide network of institutions, with the Centre for Brain Research at the Indian Institute of Science (IISc) campus in Bengaluru serving as the coordinating centre and major sequencing and analysis work carried out across institutes, including BRIC-NIBMG*, CSIR-CCMB*, and CSIR-IGIB*, alongside multiple sample-collection partners across the country.

Alongside the flagship genomics manuscript, the project has released a companion phenotypic manuscript, a public dashboard for exploratory use, open code, summary statistics, and extensive supplementary material containing much of the study’s analytical detail. At the same time, the two principal papers are still available as preprints on medRxiv and have not yet undergone peer review, meaning their findings are important but should be read with appropriate caution.

The rationale for such an effort is fairly clear. India is home to a large share of the world’s population (roughly 18 percent). Yet, both global genomic databases and much of public health research continue to rely on broad averages that smooth over the country’s remarkable internal diversity. Indian society is shaped by thousands of ethnolinguistic groups, long histories of endogamy, and sharp regional variation that cannot be adequately understood through aggregated national or state-level categories alone. In that sense, this project is not really an ancestry paper; however, eager parts of the internet may try to drag it into old civilisational quarrels of migration and caste. At its core is a question. Can better population-scale data help make medicine less blunt in a country as internally diverse as India, whether by improving the interpretation of disease risk, refining drug choice and dosing, or identifying groups that current screening systems routinely miss?

Project Findings and Medical Relevance

On the genomic side, the atlas shows that several Indian populations carry strong founder effects, where a community descends from a relatively small pool of ancestors and, over time, certain genetic variants become much more common within that group than elsewhere. It also finds high homozygosity, referring to long stretches where an individual has inherited the same version of a gene from both parents, often because marriage has remained largely within the same community over many generations. In such settings, pathogenic variants and loss-of-function variants—changes that disrupt or switch off the normal functioning of a gene—can rise to frequencies that are rare or absent in global reference datasets. It also affects variant interpretation, the clinical process of determining whether a genetic change is likely harmless, disease-causing, or still uncertain. What GenomeIndia shows is that these judgments are less certain when drawn from European-heavy databases, particularly for populations that those datasets have scarcely sampled.

The study also shows that polygenic scores, which are risk estimates built by adding up the small effects of many genetic variants, perform poorly when imported from European datasets to Indian populations, which is another way of saying that borrowed prediction tools do not work well.

The paper also reports substantial pharmacogenomic variation, which is the study of how genes affect drug response. For example, variants in NUDT15 can affect how safely some patients tolerate thiopurines, a class of drugs used in cancer and immune disorders. CYP2D6 helps metabolise many antidepressants and opioids, while VKORC1 shapes sensitivity to anticoagulants such as warfarin. In plain terms, the findings suggest that drug choice and dosing may need to be more carefully tailored in some Indian populations than the current one-size-fits-all prescribing assumes. The study also shows that polygenic scores, which are risk estimates built by adding up the small effects of many genetic variants, perform poorly when imported from European datasets to Indian populations, which is another way of saying that borrowed prediction tools do not work well.

The companion study also finds that ethnolinguistic identity helps predict health outcomes even after accounting for the state in which a person lives. The burden it identifies is both high and uneven. Low HDL and elevated triglycerides are widespread, as is what the paper calls an awareness gap. Only 17.6 percent of those whose recorded blood pressures were in the hypertensive range, and only 2.2 percent of those with dyslipidaemia, or abnormal blood-fat levels, appear to know about their condition.

It also uncovers unusual life-course patterns. One of the most striking is the disappearance of the higher HDL levels usually seen in women compared to men in tribal populations. It suggests that the biological and social factors shaping cardiovascular risk in those communities may differ from patterns often treated as standard in urban or European cohorts. Alongside this, the project built a South Asian imputation panel, which is important for future research. Imputation is the statistical process of filling in missing genetic information from a partial dataset, and a panel built on Indian genomes provides future studies with a much stronger reference point for accurate imputation in South Asian populations.

It is also fair to point out that de-identified community coding protects privacy but reduces immediate translational usefulness for outsiders who might hope to turn these findings into named community-level screening strategies.

The criticism that has surfaced in public discussion on social media platforms is only partly misplaced. It is fair to note that the cohort consists of healthy adults rather than disease-ascertained patients, which limits direct inference regarding paediatric disorders, penetrance, and some forms of clinically severe early-onset disease. It is also fair to point out that de-identified community coding protects privacy but reduces immediate translational usefulness for outsiders who might hope to turn these findings into named community-level screening strategies. Less convincing are the complaints that the study fails because it is not an ancestry-maximalist or ancient-DNA project. That was never its purpose, and the absence of finer subclade analysis (especially on the Y chromosome) or caste-level breakdowns made publicly available does not erase its medical value. The more serious issue is whether this design provides a sufficiently robust scaffold for future epidemiological and pharmacogenomic work. On that narrower and more important measure, it does.

Where GenomeIndia Should Lead

The headline figure in the phenotypic paper, that 95 percent of participants had at least one abnormal biomarker or anthropometric value, needs a more careful reading than a simple equation with illness. The authors themselves suggest it may represent an upper bound, produced by a genuine metabolic burden and also an imperfect fit between imported reference intervals and Indian populations. This is an important caveat that does not, however, alter the larger concern. Many of the thresholds and risk estimates used in everyday medicine, whether for lipids, adiposity, diabetes, or cardiovascular risk, were developed in cohorts that did not adequately represent Indian populations. GenomeIndia does not, by itself, produce a finished Indian risk engine for diabetes or cardiovascular disease, nor can a cross-sectional dataset settle where every new cut-off should lie. But it does provide something that has long been missing, namely, large-scale Indian distributions across populations rather than only across states, and genomic evidence that borrowed prediction tools, including European-derived polygenic scores, travel poorly. This gives the study a more practical significance than the headline number alone suggests. It also creates a basis for recalibrating and re-examining reference intervals, and eventually building more India-relevant risk scores that can be validated against real outcomes rather than inherited from populations with very different baseline profiles.

The practical lesson is to develop better tools that can distinguish genuine burden from misclassification, which requires Indian reference intervals rather than reflexive dependence on Western norms and population-informed surveillance.

Low HDL and elevated triglycerides were widespread, waist circumference remained high even when South Asian cut-offs were used, 22 populations showed a double burden of underweight and overweight within the same group, and diabetes prevalence in the GI phenotype cohort was far higher than NFHS-5* estimates, even if some of that gap narrowed after age-matching. The practical lesson is to develop better tools that can distinguish genuine burden from misclassification, which requires Indian reference intervals rather than reflexive dependence on Western norms and population-informed surveillance. It also requires more serious screening for underdiagnosed hypertension and dyslipidaemia, especially where awareness is weakest. *ICMR’s call for “Bharat-specific”, and where warranted, population-specific, reference intervals makes clear that reliance on Western standards is now a practical clinical question.

The genomic paper points toward a second set of implications. Endogamy and founder effects matter well beyond population genetics as they can shift the local frequency of disease-causing variants. That, in turn, means rare-disease diagnosis, carrier-risk assessment, and genetic counselling in India cannot continue to depend indefinitely on databases assembled elsewhere. That does not mean GenomeIndia justifies a crude national turn to premarital genetic screening. India’s own rare-disease policy includes premarital, post-marital and preconception prevention in principle; however, any move in this direction would need to be voluntary, community-sensitive, counselling-led and tightly governed rather than coercive or identity-driven.

The more immediate translational work lies in validation cohorts and longitudinal follow-up, alongside pharmacogenomic guidance in drug classes where the evidence is already strong enough to inform practice.

The more immediate translational work lies in validation cohorts and longitudinal follow-up, alongside pharmacogenomic guidance in drug classes where the evidence is already strong enough to inform practice. Variants in genes such as NUDT15, CYP2C19, CYP3A5 and DPYD already map onto established international guidance for thiopurines, clopidogrel, tacrolimus and fluoropyrimidines, and the recent FDA* labelling changes around DPD deficiency only sharpen the case for Indian implementation work in a few drug classes before grander claims are made. Taking GenomeIndia seriously would mean moving past abstract enthusiasm for genomics and towards practical follow-through, involving careful work in founder and isolated populations, stronger rare-disease pathways under the National Policy for Rare Diseases, drug-gene pilots where dosing can be made safer, and more targeted public-health action in places where metabolic disease is common but diagnosis still trails behind.

*BRIC-NIBMG = Biotechnology Research and Innovation Council – National Institute of Biomedical Genomics

*CSIR-CCMB = Council of Scientific & Industrial Research - Centre for Cellular & Molecular Biology

*CSIR-IGIB = Council of Scientific and Industrial Research – Institute of Genomics and Integrative Biology

*HDL = High-density lipoprotein, often called “good” cholesterol

*NFHS-5 = National Family Health Survey, round 5

*ICMR = Indian Council of Medical Research

*FDA = United States Food and Drug Administration


K.S. Uplabdh Gopal is an Associate Fellow with the Health Initiative at the Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.