New York, NY [August 28, 2025]—When genetic testing reveals a rare DNA mutation, doctors and patients are frequently left in the dark about what it actually means. Now, researchers at the Icahn School of Medicine at Mount Sinai have developed a powerful new way to determine whether a patient with a mutation is likely to actually develop disease, a concept known in genetics as penetrance.
The team set out to solve this problem using artificial intelligence (AI) and routine lab tests like cholesterol, blood counts, and kidney function. Details of the findings were reported in the August 28 online issue of Science. Their new method combines machine learning with electronic health records to offer a more accurate, data-driven view of genetic risk.
Traditional genetic studies often rely on a simple yes/no diagnosis to classify patients. But many diseases, like high blood pressure, diabetes, or cancer, don't fit neatly into binary categories. The Mount Sinai researchers trained AI models to quantify disease on a spectrum, offering more nuanced insight into how disease risk plays out in real life.
"We wanted to move beyond black-and-white answers that often leave patients and providers uncertain about what a genetic test result actually means," says Ron Do, PhD , senior study author and the Charles Bronfman Professor in Personalized Medicine at the Icahn School of Medicine at Mount Sinai. "By using artificial intelligence and real-world lab data, such as cholesterol levels or blood counts that are already part of most medical records, we can now better estimate how likely disease will develop in an individual with a specific genetic variant. It's a much more nuanced, scalable, and accessible way to support precision medicine, especially when dealing with rare or ambiguous findings."
Using more than 1 million electronic health records, the researchers built AI models for 10 common diseases. They then applied these models to people known to have rare genetic variants, generating a score between 0 and 1 that reflects the likelihood of developing the disease.
A higher score, closer to 1, suggests a variant may be more likely to contribute to disease, while a lower score indicates minimal or no risk. The team calculated "ML penetrance" scores for more than 1,600 genetic variants.
Some of the results were surprising, say the investigators. Variants previously labeled as "uncertain" showed clear disease signals, while others thought to cause disease had little effect in real-world data.
"While our AI model is not meant to replace clinical judgment, it can potentially serve as an important guide, especially when test results are unclear. Doctors could in the future use the ML penetrance score to decide whether patients should receive earlier screenings or take preventive steps, or to avoid unnecessary worry or intervention if the variant is low-risk," says lead study author Iain S. Forrest, MD, PhD, in the lab of Dr. Do at the Icahn School of Medicine at Mount Sinai. "If a patient has a rare variant associated with Lynch syndrome, for instance, and it scores high, that could trigger earlier cancer screening, but if the risk appears low, jumping to conclusions or overtreatment might be avoided."
The team is now working to expand the model to include more diseases, a wider range of genetic changes, and more diverse populations. They also plan to track how well these predictions hold up over time, whether people with high-risk variants actually go on to develop disease, and whether early action can make a difference.
"Ultimately, our study points to a potential future where AI and routine clinical data work hand in hand to provide more personalized, actionable insights for patients and families navigating genetic test results," says Dr. Do. "Our hope is that this becomes a scalable way to support better decisions, clearer communication, and more confidence in what genetic information really means."
The paper is titled "Machine learning-based penetrance of genetic variants."
The study's authors, as listed in the journal, are Iain S. Forrest, Ha My T. Vy, Ghislain Rocheleau, Daniel M. Jordan, Ben O. Petrazzini, Girish N. Nadkarni, Judy H. Cho, Mythily Ganapathi, Kuan-Lin Huang, Wendy K. Chung, and Ron Do.
This work was supported in part by the following grants: National Institute of General Medical Sciences of the National Institutes of Health (NIH) (T32-GM007280); the National Institute of General Medical Sciences of the NIH (R35-GM124836); the National Institute of Diabetes and Digestive and Kidney Diseases (U24-DK062429); the National Human Genome Research Institute of the NIH (R01-HG010365); the National Institute of General Medical Sciences of the NIH (R35-GM138113); and the National Institute of Diabetes and Digestive and Kidney Diseases (U24-DK062429).