The two main approaches for discovering disease genes reveal distinct aspects of biology, a new study shows. While both methods are widely used, the research found that they identify different genes, with major implications for drug development.
Publishing online Nov. 5 in Nature, the study revolves around the human genome, which contains thousands of genes that provide instructions for making proteins, as well as regulatory DNA that controls when genes turn on. The new investigation takes a genome-wide view of how small DNA differences — called variants — can influence traits such as height, hair color, and disease risk.
Led by researchers at NYU Langone Health, Stanford University, UC San Francisco, and the University of Tokyo, the new work analyzed two main methods used to determine how genetic differences influence disease biology. These are genome-wide association studies (GWAS), which test common variants across the genome—in genes and regulatory regions—to find those linked to disease, and burden tests, which focus on rare variants that alter proteins.
By analyzing GWAS and burden test results for 209 traits from the UK Biobank, which contains genetic data from hundreds of thousands of people, the researchers found that burden tests identify genes that mostly affect the disease being studied, with little effect on other traits, while GWAS can identify both these disease-specific genes and genes that influence many diseases and biological processes.
"Our study explains why the methods produce different results and why both are biologically important," said co-senior study author Hakhamanesh Mostafavi, PhD, assistant professor in the Department of Population Health, and in the Center for Human Genetics & Genomics, in the NYU Grossman School of Medicine. "The findings provide a new clarity on what genetic findings reveal about disease risk and how they should be used in applications like drug development."
New Approach Needed
Scientists have long used GWAS to search large genetic datasets to find disease-associated genes. The results have been confusing, however, as GWAS typically implicate hundreds of genes per disease, making it hard to know which truly matter. More recently, thanks to massive biobanks, burden tests have become powerful enough to reveal a different picture: far fewer, more interpretable genes linked to the same diseases. This raised questions about which one better reflects disease biology, and why.
The researchers discovered that a key reason for the difference in results between the two tests is that genes vary in how many traits (e.g., height or heart disease) or biological processes they affect. Some genes primarily influence one trait, while others affect multiple traits simultaneously.
The variants that severely disrupt these "multi-trait" genes have broad consequences and are removed by evolution because they often make it harder to survive or reproduce. That means they are found in fewer people and are therefore harder for burden tests to detect. In contrast, GWAS can still find these genes because regulatory DNA variants often affect gene activity in more limited ways, enabling such variants to escape evolutionary removal.
The study authors propose that two gene features are critical for ideal gene prioritization with respect to any disease risk or trait. The first is "importance" — how much a gene affects disease if disrupted. The second is trait "specificity" — whether a gene mostly affects one disease or many traits. Understanding both features would help researchers identify the best therapeutic targets and anticipate potential side effects.
A related finding involved the p-value, a standard measure of whether any study result, including from a GWAS or burden test, can be trusted as "real," or is instead likely to have occurred by chance. Strikingly, the study shows that the p-values of GWAS and burden tests are a poor indicator of a gene's importance. This matters because identifying important genes can reveal which biological processes are central to disease biology.
"Our results do not mean that GWAS and burden tests lack useful information to infer gene importance," said Mostafavi. "They just have not been interpreted in this way before. We believe new methods are needed to infer this key biological gene feature."
Moving forward, the team has begun developing methods to prioritize genes by importance. GWAS or burden tests alone do not have enough power to accurately estimate how much each gene affects disease. But by combining these results with the rapidly growing experimental data describing what each gene does inside cells, the authors say, machine learning methods can find shared patterns across genes and improve estimates.
"This would be revolutionary because it would let us leverage all of the cell-level experimental data to learn about human-level traits, identify the most important disease genes, and streamline drug development," said co-senior author Jeffrey Spence, PhD , assistant professor at the University of California, San Francisco.