Tuberculosis is the world's deadliest infectious disease, in part due to its ability to hide out for years in the lungs before starting an infection. Now, a new computational method developed by researchers at Cornell sheds light on how going dormant - sometimes for multiple generations - has affected the evolution of the tuberculosis bacterium (Mtb) and other organisms that can temporarily drop out of the gene pool.
Cornell researchers created the first model that constructs a genealogical tree for organisms that enter long-term dormancy, and estimates key factors that have affected how the organism evolved over time. This deeper understanding of pathogen evolution may help inform surveillance efforts and preparedness for future strains.
"If we want to accurately reconstruct parts of evolutionary history that we can't directly observe, it's essential to account for dormancy," said Jaehee Kim, assistant professor of computational biology in the Cornell Ann S. Bowers College of Computing and Information Science and the College of Agriculture and Life Sciences (CALS). "Ignoring dormancy could lead to incorrect conclusions about both its past evolution and its future evolutionary potential."
The study, "Bayesian Phylodynamic Inference of Population Dynamics with Dormancy," was published May 2 in the Proceedings of the National Academy of Sciences.
For population genetics studies, dormancy can really throw a wrench in the works. When conditions take a turn for the worse, many organisms, from plants to pathogens, hedge their bets by going dormant, making it harder to study their evolution. While the active members of a population keep evolving and acquiring new mutations, dormant organisms stay mostly the same, only to emerge later on, having missed out on those changes.
To tackle this problem, Kim and her team developed an open-source software program called SeedbankTree that analyzes genome sequences from a population with dormant members. The program provides estimates of what percentage of the population is dormant at a given time, how long members of the population stay dormant before reviving and how quickly the active and dormant members are accumulating mutations in their genome.
The team first tested its method using synthetic genetic data, then applied it using DNA sequences from a real tuberculosis outbreak that occurred in New Zealand from 1992-2011. Epidemiologists had done extensive contact tracing, so they could identify cases with dormant infections. SeedbankTree estimated that dormant Mtb mutate only about one-eighth as fast as bacteria during an active infection, and the average time before reactivating was 1.27 years. Both estimates align with values from previous Mtb studies, supporting the model's effectiveness.
This approach can also help identify which strains of a pathogen are rapidly evolving, which could indicate future strains of concern.
"Our analysis is especially important for improving the identification of novel strains of pathogens that have an increased rate of transmission. Failing to account for dormancy can seriously mislead those efforts,"said co-author Andrew Clark, the Jacob Gould Schurman Professor of Population Genetics in the College of Arts and Sciences and chair of the Department of Computational Biology.
Importantly, the team used Bayesian methods in their model, which enables them to know how confident they should be in the model's estimates.
"To effectively inform both scientific inquiry and policy decisions, a probability-based risk assessment is essential," said Martin Wells, the Charles A. Alexander Professor of Statistical Sciences in the the ILR School and Cornell Bowers CIS. "This methodology is valuable for its ability to account for uncertainty."
Beyond tracing outbreaks, this approach can also suggest if organisms have gone through dormancy, just based on their genome sequences, and yields a more accurate estimate of how quickly they accumulate mutations. If scientists don't take dormancy into account, they are likely to conclude that an organism evolves very slowly, when in reality it spent long periods of time hiding out.
Next, the team plans to apply their model to understand the role of dormancy in cancer initiation and progression. They are also interested in how dormancy could complicate gene drives aimed at eradicating certain plants. Gene drives are a type of genetic engineering that makes a gene much more likely to be passed down to the next generation, and has the potential to cause the extinction of weeds, pests and pathogens. But, dormant seeds in the soil could allow the unmodified plants to make a comeback.
Co-authors on the study include Wai Tung "Jack" Lo '22, M.Eng. '23; Joy Zhang, a doctoral student in the field of applied mathematics; Peiyu Xu, a doctoral student in the field of genetics; Daniel Barrow, research associate; Ishani Chopra '24; and first-author Lorenzo Cappello of the Universitat Pompeu Fabra, Spain.
The researchers received support from the National Institute of General Medical Sciences, the National Institute of Allergy and Infectious Diseases and a Ramón y Cajal fellowship from Spain's Ministry of Science.
Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.