With around 5,000 different ethno-linguistic and religious groups, India is one of the most culturally and genetically diverse countries in the world. Yet, it remains underrepresented in genomic surveys, even when compared to other non-European groups, such as East Asians and Africans.
A new analysis of Indian genomes — the largest and most complete to date — helps untangle these groups' complex evolutionary history, uncovering a 50,000-year history of genetic mixing and population bottlenecks that shaped genetic variation, health and disease in South Asia.
The analysis, led by researchers at the University of California, Berkeley, the All India Institute of Medical Sciences (AIIMS) in New Delhi, India, the University of Southern California (USC) and the University of Michigan, was published today (June 26) in the journal Cell.
"These findings fill a critical gap and reshape our understanding of how ancient migrations, archaic admixture and social structures, like endogamy, have shaped the Indian genetic variation and risk of diseases, and will help inform precision health strategies in India," said Priya Moorjani , a senior author of the paper and a UC Berkeley assistant professor of molecular and cell biology.
Because of the complex history of gene flow and endogamy, or within-community marriages, Moorjani said, some groups within India are as genetically different from each other as Europeans are from East Asians. Studying diverse individuals across India thus helps to understand how ancient ancestry, geography, language and social practices interacted.
The researchers analyzed the complete genomes of 2,762 individuals, representing most of the nation's major linguistic, ethnic and geographic communities, that were sequenced as part of the Longitudinal Aging Study in India-Diagnostic Assessment of Dementia (LASI-DAD). LASI-DAD seeks to understand genetic variation in India and uncover the causes of aging and age-associated diseases. Moorjani and her UC Berkeley team were supported by grants from the National Institutes of Health.
The UC Berkeley team found that most of the genetic variation in India can be explained by a single migration of humans out of Africa about 50,000 years ago. These populations interbred with now-extinct relatives — Neanderthals and Denisovans — and then spread throughout Europe and Asia, including India. As a result, Indians and Europeans both carry about equal amounts of Neanderthal genes — between 1% and 2% of the entire genome.
"Potentially, there were earlier waves out of Africa to India, but it's likely that those groups either did not survive or left little genetic impact on today's populations," said Elise Kerdoncuff, a former UC Berkeley postdoctoral fellow and one of two lead authors of the paper.
Surprisingly, Indians have a greater variety of Neanderthal DNA segments than other populations around the world. According to Moorjani, the European populations sampled by earlier studies share among them about 30% of the Neanderthal genome. The much smaller sample of Indian genomes, however, contained Neanderthal ancestry segments representing half of the Neanderthal genome.
"One of the most striking and unexpected findings was that India harbors the highest variation in Neanderthal ancestry among non-Africans," said co-lead author Laurits Skov, a former UC Berkeley postdoctoral fellow. "This allowed us to reconstruct around 50% of the Neanderthal genome and 20% of the Denisovan genome from Indian individuals, more than any other previous archaic ancestry study."
"This is because of the complex history of South Asians," Moorjani explained. "They've had multiple mixture events over the past 10,000 years, followed by strong bottlenecks in many groups. Together, that leads to a very complex mosaic of different ancestries, such that when you compare the Neanderthal segments in two individuals, they're often not shared."
Neolithic farmers from Iran
The earliest inhabitants of India may have been hunter-gatherers, who were ancient ancestral South Indians whose closest genetic relatives today may still be living on the isolated Andaman Islands in the Bay of Bengal. People from the south of India have higher levels of this ancestry than those in the north.
Archaeological evidence, including from the Mehrgarh site in present-day Pakistan, suggests that agriculture in South Asia began 8,000–9,000 years ago, likely introduced by Neolithic farmers from West Asia, as it involved wheat and barley — crops originally domesticated in that region. Kerdoncuff analyzed 14 ancient populations from the Neolithic to the Iron Age from Central Asia and the Middle East to pinpoint where the farmers may have come from and found that the closest match are fourth millennium BCE farmers and herders from Tajikistan, specifically, an archeological site named Sarazm.
Archeologists had previously documented trade connections between Sarazm and South Asia, including connections with agricultural sites of Mehrgarh and the early Indus Valley Civilization. Kerdoncuff noted that "it is striking that one of the two Sarazm individuals in our study was found with shell bangles that are identical to ones found in Neolithic sites in India and Pakistan, and made from sea shells originating from the Indian Ocean or the Arabian Sea."
The genome analysis also confirmed evidence in India for steppe pastoralist ancestry, ranging between 0-45% among present-day individuals. Together, these three groups — farmers, pastoralists and hunter-gatherers — gave rise to the genetic variation now seen throughout India.
The role played by endogamy
After this complex mix of cultures, however, India experienced a shift toward strong endogamy, the practice of marrying within one's community. Endogamous marriages increase the prevalence of deleterious variants and the chance that an individual would inherit two, homozygous copies of a bad gene, if the parents are related or from a small population. In a previous paper, Moorjani and her colleagues determined that these types of bottlenecks, or founder events, occurred between 3,500 and 2,000 years ago. A founder event is when a small number of ancestral individuals gives rise to a large fraction of the population, often because war, famine or disease drastically reduced the population, but also because of geographic isolation — on islands, for example — or cultural practices.
"With these founder events, members of a group become much more related because they're exchanging genes just within the community," Moorjani said. "So if a deleterious variant is present in the community, it can drift to high frequency in the population because there's less variation."
One example of such a recessive trait is a mutation in the butylcholinesterase (BCHE) gene that causes muscle paralysis and other severe reactions to anesthetics like micavarium. It is particularly prevalent in communities such as the Vysya in Andhra Pradesh and Telangana, Moorjani said, but present at very low frequency across the rest of India and not present outside India. Identifying such variants is crucial for genetic screening and improving medical interventions, she said.
The team also identified numerous rare and population-specific pathogenic genetic variants, including variants linked to blood disorders, congenital hearing loss, cystic fibrosis and phenylketonuria.
Moorjani and her colleagues are continuing to analyze the Indian genomes in the LASI-DAD study, which is part of the LASI study that has collected over 70,000 individuals, and aims to sequence a subset to study epigenetic differences, metabolomics and proteomics to understand aging and age-associated diseases in India.
"Our expertise is in leveraging the evolutionary history to do more reliable disease mapping, because this complex history highlights how critical it is to incorporate ancestry and homozygosity in future medical and functional genomics research in India," Moorjani said.
The other senior authors of the paper are Aparajit Ballav Dey of the All India Institute of Medical Sciences, Sharon Kardia of the University of Michigan and Jinkook Lee of USC. Kerdoncuff currently is at the Pasteur Institute in Paris. Skov is an assistant professor at the Globe Institute, at the University of Copenhagen, Denmark. Moorjani and her UC Berkeley team were supported by grants from the National Institutes of Health (U01AG065958, R35GM142978), Burroughs Wellcome Fund and Denmark's Novo Nordisk Foundation.