“Most of these supposed useless data are often treated as trash and remain unexplored. However, in some cases, treasures hidden in these data are discarded [2, 4].”
Cancer research has significantly improved in recent years, primarily due to next-generation sequencing (NGS) technology. Consequently, an enormous amount of genomic and transcriptomic data has been generated. In most cases, the data needed for research goals are used, and unwanted reads are discarded. However, these eliminated data contain relevant information. Aiming to test this hypothesis, genomic and transcriptomic data were acquired from public datasets.
In this new research perspective, researchers Fabiano Cordeiro Moreira, Dionison Pereira Sarquis, Jorge Estefano Santana de Souza, Daniel de Souza Avelar, Taíssa Maria Thomaz Araújo, André Salim Khayat, Sidney Emanuel Batista dos Santos, and Paulo Pimentel de Assumpção from Instituto Metrópole Digital at the Universidade Federal do Rio Grande do Norte and Núcleo de Pesquisas em Oncologia and Instituto de Ciências Biológicas at the Universidade Federal do Pará used metagenomic tools to explore genomic cancer data; additional annotations were used to explore differentially expressed ncRNAs from miRNA experiments, and variants in adjacent to tumor samples from RNA-seq experiments were also investigated.
“Here, we demonstrate potential strategies to benefit from nontargeted information resulting from high-throughput cancer investigations.”
In all analyses, new data were obtained: from DNA-seq data, microbiome taxonomies were characterized with a similar performance of dedicated metagenomic research; from miRNA-seq data, additional differentially expressed sncRNAs were found; and in tumor and adjacent to tumor tissue data, somatic variants were found.
These findings indicate that unexplored data from NGS experiments could help elucidate carcinogenesis and discover putative biomarkers with clinical applications. Further investigations should be considered for experimental design, providing opportunities to optimize data, saving time and resources while granting access to multiple genomic perspectives from the same sample and experimental run.
“Altogether, our results strengthen the hypothesis that abundant additional and potentially useful information can be extracted from NGS. Moreover, the integrated investigation of every available information should provide a broader and more robust interpretation of the molecular scenario from each experiment.”
Correspondence to: Paulo Pimentel de Assumpção