Detecting cancer in the earliest stages could dramatically reduce cancer deaths because cancers are usually easier to treat when caught early. To help achieve that goal, MIT and Microsoft researchers are using artificial intelligence to design molecular sensors for early detection.
The researchers developed an AI model to design peptides (short proteins) that are targeted by enzymes called proteases, which are overactive in cancer cells. Nanoparticles coated with these peptides can act as sensors that give off a signal if cancer-linked proteases are present anywhere in the body.
Depending on which proteases are detected, doctors would be able to diagnose the particular type of cancer that is present. These signals could be detected using a simple urine test that could even be done at home.
"We're focused on ultra-sensitive detection in diseases like the early stages of cancer, when the tumor burden is small, or early on in recurrence after surgery," says Sangeeta Bhatia, the John and Dorothy Wilson Professor of Health Sciences and Technology and of Electrical Engineering and Computer Science at MIT, and a member of MIT's Koch Institute for Integrative Cancer Research and the Institute for Medical Engineering and Science (IMES).
Bhatia and Ava Amini '16, a principal researcher at Microsoft Research and a former graduate student in Bhatia's lab, are the senior authors of the study, which appears today in Nature Communications. Carmen Martin-Alonso PhD '23, a founding scientist at Amplifyer Bio, and Sarah Alamdari, a senior applied scientist at Microsoft Research, are the paper's lead authors.
Amplifying cancer signals
More than a decade ago, Bhatia's lab came up with the idea of using protease activity as a marker of early cancer. The human genome encodes about 600 proteases, which are enzymes that can cut through other proteins, including structural proteins such as collagen. They are often overactive in cancer cells, as they help the cells escape their original locations by cutting through proteins of the extracellular matrix, which normally holds cells in place.
The researchers' idea was to coat nanoparticles with peptides that can be cleaved by a specific protease. These particles could then be ingested or inhaled. As they traveled through the body, if they encountered any cancer-linked proteases, the peptides on the particles would be cleaved.
Those peptides would be secreted in the urine, where they could be detected using a paper strip similar to a pregnancy test strip. Measuring those signals would reveal the overactivity of proteases deep within the body.
"We have been advancing the idea that if you can make a sensor out of these proteases and multiplex them, then you could find signatures of where these proteases were active in diseases. And since the peptide cleavage is an enzymatic process, it can really amplify a signal," Bhatia says.
The researchers have used this approach to demonstrate diagnostic sensors for lung , ovarian , and colon cancers.
However, in those studies, the researchers used a trial-and-error process to identify peptides that would be cleaved by certain proteases. In most cases, the peptides they identified could be cleaved by more than one protease, which meant that the signals that were read could not be attributed to a specific enzyme.
Nonetheless, using "multiplexed" arrays of many different peptides yielded distinctive sensor signatures that were diagnostic in animal models of many different types of cancer, even if the precise identity of the proteases responsible for the cleavage remained unknown.
In their new study, the researchers moved beyond the traditional trial-and-error process by developing a novel AI system, named CleaveNet, to design peptide sequences that could be cleaved efficiently and specifically by target proteases of interest.
Users can prompt CleaveNet with design criteria, and CleaveNet will generate candidate peptides likely to fit those criteria. In this way, CleaveNet enables users to tune the efficiency and specificity of peptides generated by the model, opening a path to improving the sensors' diagnostic power.
"If we know that a particular protease is really key to a certain cancer, and we can optimize the sensor to be highly sensitive and specific to that protease, then that gives us a great diagnostic signal," Amini says. "We can leverage the power of computation to try to specifically optimize for these efficiency and selectivity metrics."
For a peptide that contains 10 amino acids, there are about 10 trillion possible combinations. Using AI to search that immense space allows for prediction, testing, and identification of useful sequences much faster than humans would be able to find them, while also considerably reducing experimental costs.
Predicting enzyme activity
To create CleaveNet, the researchers developed a protein language model to predict the amino acid sequences of peptides, analogous to how large language models can predict sequences of text. For the training data, they used publicly available data on about 20,000 peptides and their interactions with different proteases from a family known as matrix metalloproteinases (MMPs).
Using these data, the researchers trained one model to generate peptide sequences that are predicted to be cleaved by proteases. These sequences could then be fed into another model that predicted how efficiently each peptide would be cleaved by any protease of interest.
To demonstrate this approach, the researchers focused on a protease called MMP13, which cancer cells use to cut through collagen and help them metastasize from their original locations. Prompting CleaveNet with MMP13 as a target allowed the models to design peptides that could be cut by MMP13 with considerable selectivity and efficiency. This cleavage profile is particularly useful for diagnostic and therapeutic applications.
"When we set the model up to generate sequences that would be efficient and selective for MMP13, it actually came up with peptides that had never been observed in training, and yet these novel sequences did turn out to be both efficient and selective," Martin-Alonso says. "That was very exciting to see."
This kind of selectivity could help to reduce the number of different peptides needed to diagnose a given type of cancer, to identify novel biomarkers, and to provide insight into specific biological pathways for study and therapeutic testing, the researchers say.
Bhatia's lab is currently part of an ARPA-H funded project to create reporters for an at-home diagnostic kit that could potentially detect and distinguish between 30 different types of cancer, in early stages of disease, based on measurements of protease activity. These sensors could include detection of not only MMP-mediated cleavage, but other enzymes such as serine proteases and cysteine proteases.
Peptides designed using CleaveNet could also be incorporated into cancer therapeutics such as antibody treatments. Using a specific peptide to attach a therapeutic such as a cytokine or small molecule drug to a targeting antibody could enable the medicine to be released only when the peptides are exposed to proteases in the tumor environment, improving efficacy and reducing side effects.
Beyond direct applications in diagnostics and therapeutics, combining efforts from the ARPA-H work with this modeling framework could enable the creation of a comprehensive "protease activity atlas" that spans multiple protease classes and cancers. Such a resource could further accelerate research in early cancer detection, protease biology, and AI models for peptide design.
The research was funded by La Caixa Foundation, the Ludwig Center at MIT, and the Marble Center for Cancer Nanomedicine.