
Study: Generation of connections between protein sequence space and chemical space to enable a predictive model for biocatalysis (DOI: 10.1038/s41586-025-09519-5)
University of Michigan and Carnegie Mellon University researchers have developed a new tool that makes greener chemistry more accessible.
The tool, described in a study supported by the U.S. National Science Foundation and published Oct. 1 in the journal Nature, removes a major barrier to wider adoption of biocatalysis.
Biocatalysts, also called enzymes, are a type of protein that have evolved to perform chemistry that can be complex and incredibly efficient-typically in water and at room temperature-removing the need for toxic or expensive chemical reagents to run reactions. But they are also highly selective, meaning that they are specialized to work with the specific starting compounds, or substrates, they interact with in their natural environment.
To capitalize on the power of biocatalysts in the lab, though, chemists need to know what other substrates a protein can work with and, more precisely, which enzymes will work with their desired substrate.
"Biocatalysis offers a more sustainable way to build molecules, and it can also give us access to molecules that we couldn't build using traditional chemical methods," said Alison Narayan, professor of chemistry in the U-M College of Literature, Sciences, and the Arts and research professor at the Life Sciences Institute. "But most of the known substrates for these biocatalysts come from nature, which is just a very small subset of the molecules that chemists work with."
Narayan's team envisioned bridging the longstanding gap between the starting compounds chemists are working with and the enzymes that could potentially react with those compounds. The project began with an effort to match proteins with substrates on a large scale. Focusing on one family of enzymes, Alexandra Paton designed a high-throughput reaction platform that allowed the team to test more than 100 substrates against each protein across the entire protein family.
"We discovered hundreds of new connections between chemical space and protein space and built this diverse dataset," said Paton, a former postdoctoral fellow in Naryan's lab and the study's first author. "That is when we began to think more broadly about what we could build with all this data."
Narayan's team along with Gabe Gomes, assistant professor of chemical engineering and chemistry at Carnegie Mellon University, and Daniil Boiko, then a graduate student in Gomes' lab, leveraged this dataset to realize an enzyme recommender system. The Gomes lab applied its expertise in machine learning to optimize a predictive model that can navigate between the protein landscape and the chemical landscape.
The resulting open-access CATNIP online platform enables chemists to input their starting compound and receive a ranked list of biocatalysts from this protein family that would best enable a chemical transformation; or, going in the other direction, one can start with an enzyme of interest and identify its potential substrates. Boiko describes the platform's predictive capability as analogous to a web search, optimizing the results to ensure the best answers-or the most promising candidates-appear at the top of the list in ranked likelihood of their success.
"It is a great starting model to enable synthetic campaigns using biocatalysts," said Paton, who is now an assistant professor of chemistry at University of Rochester. "And there is already work underway to begin expanding the database beyond this one enzyme family."
The research was also supported by the Novartis Global Scholars Program, the Camille Dreyfus Teacher Scholar Award and the University of Michigan. Other study authors include Jonathan Perkins and Nicholas Cemalovic of U-M and Thiago Reschützegger of the Federal University of Santa Maria, Brazil.