New Tool To Detect Viruses In Sequence Data

A new software algorithm developed at Caltech enables researchers to easily search for viruses in RNA sequence data, enabling scientists to detect viruses in samples and study how they impact biological functions.

The number of individual viruses on Earth is nearly unfathomable: There are an estimated 10 million individual viruses for each star in the universe. Viruses are everywhere, even if they are not causing disease, and there are still many unexplored questions about how they impact our daily lives. For example, it is theorized that some neurodegenerative disorders, such as Alzheimer's and Parkinson's, may have their origins in viral infections. The new algorithm, built on an existing software tool called kallisto, can now reveal the workings of this previously invisible viral world.

The research was conducted in the laboratory of Lior Pachter (BS '94), Bren Professor of Computational Biology and Computing and Mathematical Sciences. A paper describing the research appears on DATE in the journal Nature Biotechnology.

"When sequencing RNA from a human lung sample, for example, you capture all RNA-primarily human, but also that of any viruses infecting the human cells," says former graduate student Laura Luebbert (PhD '24), the study's first author. "With standard analysis approaches, this information about viral presence is typically discarded. Our tool, however, allows researchers to retain and quantify these data, even for unexpected or new viruses."

Modern transcriptomic tools measure the genes expressed in cells and have produced massive amounts of sequence data. Techniques like single-cell RNA sequencing can identify the transcriptomic material present in individual cells, enabling researchers to understand the inner workings of different kinds of cells within a sample. In principle, these data also offer the opportunity to study the viruses present in these samples; the new tool makes this possible.

kallisto is a computational program able to distinguish viral genetic material within sequence data. The vast majority of viruses that cause common infectious diseases are RNA viruses (those that use RNA, not DNA, as their genetic material), which share a critical piece of protein machinery called the RNA-dependent RNA polymerase (RdRp). By searching for the genetic sequence of this protein, kallisto can identify over 100,000 species of viruses with minimal computational cost.

Luebbert and her team envision the tool's widespread use in datasets to monitor emerging diseases and study the vast viral world around us.

"The product is a software tool designed to be user-friendly to any biologist," Pachter says. "We built on a database called PalmDB, first developed by researchers Robert C. Edgar and Artem Babaian, and we added our own novel algorithmic ideas. Any researcher with sequence data can run kallisto and find out what viruses are in their sample and which cells they are present in."

The paper is titled "Detection of viral sequences at single-cell resolution identifies novel viruses associated with host gene expression changes." In addition to Luebbert and Pachter, co-authors are Caltech graduate students Delaney K. Sullivan and Maria Carilli, and former graduate students Kristján Eldjárn Hjörleifsson (PhD '23), Tara Chari (PhD '24), and Alexander Viloria Winnett (PhD '24, now a postdoctoral scholar at Caltech). Funding was provided by Caltech, the UCLA-Caltech Medical Scientist Training Program, the National Science Foundation, the National Institutes of Health, and the Gates Foundation. Lior Pachter is an affiliated faculty member with the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech .

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.