Plant Virus Genome Tools: How Do They Compare?

Pennsylvania State University

Research persists on replication: Different teams need to make the same conclusion multiple times before a consensus is reached and the finding can be built upon or applied. More may rest on the tools researchers use than previously realized - at least for understanding viruses, according to a team at Penn State.

Every time a virus replicates inside a living organism, there's a chance that the new copy will be an imperfect, defective version of the original virus. Learning more about the genome structure of these defective copies has the potential to reveal clues about the virus's biology, but researchers found that five tools available to identify these defective genomes from data obtained through next generation sequencing datasets may be inconsistent.

The team published their work in the Journal of General Virology.

Anthony Taylor, a doctoral student in plant biology and lead author on the study, said the findings have implications for how scientists can best conduct their research.

"We suggest utilizing more than one program when analyzing a dataset to get multiple 'points of view' and a thorough analysis of the data," he said. "It is worth considering that the method of sequencing - how researchers extract, prepare and process genetic information - likely influences program outputs as well."

He added that while scientists have known about these defective viruses for decades, little is known about the roles they play in infection, with many theorizing that learning more about them will let scientists better understand and predict a virus's evolution, pathogenesis and transmission. For instance, some researchers have theorized that in animals, the defective viruses could help stimulate the immune systems.

More importantly, "it's been hypothesized and shown in mammalian cell systems that these defective viral genomes that spontaneously occur in virus replications could be used as antiviral therapies," said Marco Archetti, associate professor in the Eberly College of Science and co-author on the paper. "It may be possible to create new defective versions to use as treatments, and that's why we wanted to use these programs to find and learn more about them."

Taylor said that he conducted this study because he wanted to learn more about how defective virus genomes influence virus ecology in plants, and he realized that the only way to choose among the available bioinformatic tools made to discover defective genomes was by doing a comparative analysis of the tools and apply them on selected publicly available virus sequence datasets.

"There are special bioinformatic programs that take these huge data sets that come from genome sequencing and analyze them for defective viral genomes," he said. "My question was: If you run a dataset through all these programs, are the results consistent? Do you get the same junction points in the same frequency, or does each program give you completely different results?"

For the study, Taylor compiled eight datasets generated in previous studies to run through the five bioinformatic programs currently available for detecting "junction points" - sections of the genome that have been deleted during replication - in viruses.

Six of these datasets included genome sequencing data from plant and virus combinations that hadn't been analyzed for defective genome presence in their original study. Two of the datasets - one that was computer generated and another generated for SARS-CoV-2 by Archetti - acted as controls since the common junction points already were known.

The researchers also were interested in exploring whether there are patterns between viruses and different hosts. For example, does a virus tend to produce defective versions of itself only in certain plants? Three of the six data sets focused on virus-host combinations that previously were known to produce specific defective viral genomes. The three other data sets included virus-host combinations that previously had not been tested for defects.

After the scientists performed their analysis with all five programs, Taylor said, they found little overlap in the results. Most of the programs identified different junction points and/or identified different ones as the most commonly occurring.

Cristina Rosa, professor of plant virology in the College of Agricultural Sciences and co-author on the study, said the findings are important for researchers interested in working with genome sequencing data.

"There's this incredible amount of data that we are generating with next-generation sequencing technology, which is very useful for asking biological questions," she said. "So, we asked how good are the tools made to analyze them, and how can we evaluate their results? The reality is they're good, but it might be best to run the same dataset through multiple programs to look at the overlap when conducting this type of research."

Generating full-sequence datasets specifically for this purpose, instead of relying on datasets generated in different ways by different labs, would help improve these tools and promote their applicability to answer biologically relevant questions - for instance, on virus replication - the researchers said.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.