India's population is genetically one of the most diverse in the world, yet it remains underrepresented in global datasets. In a study publishing in the Cell Press journal Cell, researchers analyzed genomic data from more than 2,700 people from across India, capturing genetic variation from most geographic regions, linguistic groups, and communities. They found that most modern-day Indian people's ancestry can be traced back to Neolithic Iranian farmers, Eurasian Steppe pastoralists, and South Asian hunter-gatherers.
"This study fills a critical gap and reshapes our understanding of how ancient migrations, archaic admixture, and social structures have shaped Indian genetic variation," says senior author Priya Moorjani of the University of California, Berkeley. "Studying these subpopulations allows us to explore how ancient ancestry, geography, language, and social practices interacted to shape genetic variation. We hope our study will provide a deeper understanding of the origin of functional variation and inform precision health strategies in India."
The researchers used data from the Longitudinal Aging Study in India, Diagnostic Assessment of Dementia (LASI-DAD) and generated whole-genome sequences from 2,762 individuals in India, including people who spoke a range of different languages. They used these data to reconstruct the evolutionary history of India over the past 50,000 years at fine scale, showing how history impacts adaptation and disease in present-day Indians. They showed that most Indians derive ancestry from populations related to three ancestral groups: Neolithic Iranian farmers, Eurasian Steppe pastoralists, and South Asian hunter-gatherers.
"In India, genetic and linguistic variation often go hand in hand, shaped by ancient migrations and social practices," says lead author Elise Kerdoncuff of UC-Berkeley. "Ensuring linguistic variation among the people whose genomes we include helps prevent biased interpretations of genetic patterns and uncover functional variation related to all major communities to inform both evolutionary research and future biomedical surveys."
One of the key goals of the study was to understand how India's complex population history has shaped genetic variation related to disease. In India, many subpopulations have an increased risk of recessive genetic disorders, which is due largely to historical isolation and marrying within communities.
Another focus was on the impact of archaic hominin ancestry—specifically, Neanderthal and Denisovan—on disease susceptibility. For example, some of the genes inherited from these archaic groups have an impact on immune functions.
"One of the most striking and unexpected findings was that India harbors the highest variation in Neanderthal ancestry among non-Africans," says co-lead author Laurits Skov, also of UC-Berkeley. "This allowed us to reconstruct around 50% of the Neanderthal genome and 20% of the Denisovan genome from Indian individuals, more than any other previous archaic ancestry study."
One constraint of this work was the limited availability of ancient DNA from South and Central Asia. As more ancient genomes become available, the researchers will be able to refine this work and identify the source of Neolithic Iranian farmer and Steppe pastoralist-related ancestry in contemporary Indians. They also plan to continue studying the LASI-DAD cohort to enable a closer look at the source of the genetic adaptations and disease variants across India.