When scientists sequence tumor DNA, they typically find small amounts of genetic code from bacteria, viruses and fungi - microorganisms that, if actually present in tumor tissues, could influence how they grow, evade immunity or respond to treatment.
But do microorganisms truly reside in tumors, or do the samples become contaminated before sequencing occurs?
Independent analyses of the same genomic data have reached wildly different conclusions. Now, researchers at Rutgers Cancer Institute, the state's only National Cancer Institute-designated Comprehensive Cancer Center, have developed a computational tool that settles the controversy by distinguishing genuine microbial signals from artifacts. Their findings are published in Cancer Cell.
"There are microbes all over the environment, on our skin and in our breath," said Subhajyoti De, a member of the Genomic Instability and Cancer Genetics Program at Rutgers Cancer Institute and the senior author of the study. "There could be DNA particles floating in the air. How do you know what you're finding came from the tissue you were interested in, or was something introduced along the way?"
The tool, called PRISM (Precise Identification of Species of the Microbiome), addresses all those issues. It uses rapid screening for an initial overview, then applies more stringent steps to remove lingering human sequences and perform full-length alignment of measured genomic sequences to microbial reference databases. Finally, it uses a machine-learning model trained to predict whether each detected microbe is truly present or a contaminant.
PRISM is designed to identify microbial sequences hiding inside standard human sequencing experiments and then estimate which microbes are likely to have been present in the original tissue and which ones most resemble contamination introduced during processing.
Understanding which microbes are truly present in tumors matters because it could reveal new treatment strategies, identify patients who might benefit from microbiome-targeted therapies and explain why some treatments work better in certain patients. More importantly, PRISM enables researchers to mine massive existing genomic datasets - representing thousands of patient samples already collected and sequenced - without expensive new laboratory work.
"As a scientific community, we could do specific microbial sequencing to identify the microbes, but that would be expensive," said Bassel Ghaddar, a former graduate fellow in systems biology at the Rutgers Microbiome Program and lead author of the study. "Using standard sequencing that is done to identify the human genome or RNA sequences, as we are doing with PRISM, can obtain this at no additional cost."
The challenge of determining which microbial sequences come from samples and which come from elsewhere is formidable. Microbes can hitchhike into a sample from reagents, surfaces or even fragments of DNA floating in the environment.
What is more, algorithms can confuse similar sequences from human tissues and microbes as well as sequences from different microbes.
To train the model, the researchers assembled 833 samples from more than 200 studies with known microbial profiles. When tested on other sample sets with known microbial compositions, PRISM achieved sensitivities and specificities above 90% and outperformed five other methods.
The researchers then applied PRISM to analyze 25 cancer types from nearly 4,400 tumor samples in The Cancer Genome Atlas and the Clinical Proteomic Tumor Analysis Consortium.
The broad picture PRISM uncovered was, in one sense, intuitive. Microbial signals were strongest in cancers from microbe-rich tissues such as head and neck cancers, gastrointestinal tumors and cervical cancers. In many other types of cancer, microbial signals were minimal.
"That's more in line with what you would expect conceptually," said De, adding that internal organs not connected to the external environment are generally thought to have little resident microbiome. "It was very surprising when previous studies found a high abundance of microbes in those tumor types."
PRISM also explained why some tumors falsely appeared microbe-heavy in earlier analyses. Many "detected" microbes outside the mouth, gut and cervix were known to be frequent lab contaminants. In other words, at least some of the apparent "tumor microbiome" in certain cancers may have been a signature of how samples were processed rather than where they came from.
The most detailed example in the study comes from pancreatic cancer, where PRISM classified a subset of tumors as having detectable microbes, including E. coli, which can produce a DNA-damaging toxin called colibactin. Those tumors were associated with shifts in glycoprotein modifications, molecular changes that can alter how proteins behave and how cells interact.
De said the altered glycoproteins clustered in pathways tied to a process that helps build the dense, fibrotic tissue that can prevent drugs and immune cells from penetrating pancreatic tumors. The analysis also found that patients with a greater smoking history tended to have higher microbial abundance in their tumors.
"We are finding a signal in the bacteria that is correlated with a phenotype in the cells," De said. "But we do not always know from this experiment alone whether the microbes are driving the tumor cell phenotype."
Still, by narrowing which signals are most likely to be real, PRISM could help the field focus on fewer, sharper hypotheses.
"By allowing us to look at host-microbe interactions in existing data, we can generate hypotheses about which patients might respond to certain therapies or which microbial metabolites might be druggable targets," Ghaddar said.
The tool is freely available to academic researchers through cloud-based developer platform GitHub, though Rutgers has filed for intellectual property protection for commercial applications. The researchers said the tool can be applied beyond cancer to any genomic sequencing study, particularly for gastrointestinal diseases where the microbiome plays known roles.