Scientists at UCLA and the University of Toronto have developed an advanced computational tool, called moPepGen , that helps identify previously invisible genetic mutations in proteins, unlocking new possibilities in cancer research and beyond.
The tool, described in Nature Biotechnology , will help understand how changes in our DNA affect proteins and ultimately contribute to cancer, neurodegenerative diseases, and other conditions. It provides a new way to create diagnostic tests and to find treatment targets previously invisible to researchers.
Proteogenomics combines the study of genomics and proteomics to provide a comprehensive molecular profile of diseases. However, a major challenge has been the inability to accurately detect variant peptides, limiting the ability to identify genetic mutations at the protein level. Existing proteomic tools often fail to capture the full diversity of protein variations.
To overcome this challenge, the researchers developed moPepGen, which enables more precise identification of protein variations.
"We developed moPepGen to help researchers determine which genetic variants are truly expressed at the protein level, addressing a long-standing challenge in the proteogenomic community," said Chenghao Zhu, PhD , a postdoctoral scholar at the department of human genetics at UCLA and co-first author of the study. "Our tool significantly improves the detection of hidden protein variations by using a graph-based approach to efficiently process all types of genetic changes. This provides a more comprehensive view of protein diversity and gives researchers a much more accurate picture of how mutations influence disease."
This level of precision is critical because proteins play a fundamental role in nearly every biological function, and alterations in their structures can signal disease progression, particularly in cancer. Yet, analyzing proteins to detect these changes remains an immense computational challenge.
Unlike existing methods, which primarily detect simple genetic changes such as single amino acid substitutions, moPepGen is designed to identify a wide range of protein variations caused by alternative splicing, circular RNAs, gene fusions, RNA editing, and other complex genetic modifications. The tool systematically models how genes are expressed and translated into proteins, significantly expanding the ability to detect disease-associated mutations.
"Until now, there hasn't been a practical way to handle the enormous complexity of genetic and transcriptomic variation," said Zhu. "The algorithm works rapidly, even when analyzing massive amounts of data, and is designed to function across multiple technologies and species."
To demonstrate its effectiveness, the team used moPepGen to analyze proteogenomic data from five prostate tumors, eight kidney tumors, and 376 cell lines. They found that moPepGen successfully identified previously undetectable protein variations linked to genetic mutations, gene fusions, and other molecular changes. It also proved more sensitive and comprehensive than previous methods, detecting four times more unique protein variants than older approaches.
The researchers noted that one of moPepGen's most exciting applications is in immunotherapy, as it can identify cancer-specific variant peptides that may serve as neoantigen candidates, which is key to developing personalized cancer vaccines and cell therapies.
"By making it easier to analyze complex protein variations, moPepGen has the potential to advance research in cancer, neurodegenerative diseases, and other fields where understanding protein diversity is critical," said Paul Boutros, PhD , professor of urology and human genetics at the David Geffen School of Medicine at UCLA, director of cancer data science at the UCLA Health Jonsson Comprehensive Cancer Center and co-senior author of the study. "It bridges the gap between genetic data and real-world protein expression, unlocking new possibilities in precision medicine and beyond."
The tool is freely available for researchers and can integrate with existing proteomics workflows, making it accessible for labs worldwide.
The study's other first author is Lydia Liu, PhD, and the other senior author is Thomas Kislinger, PhD, both from the University of Toronto. A full list of authors is available in the study.
Boutros also serves as the interim vice dean for research at the David Geffen School of Medicine at UCLA, associate director of cancer informatics at the UCLA Institute for Precision Health and is a member of the Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research.