Metagenomic Software Boosts Microbial Diversity Research

Earlham Institute

Metagenomics relies on the use of software programmes called assemblers, which can reconstruct tens of thousands of individual microbial genomes from DNA sequencing of samples such as soil, bodily fluids, or clinical swabs from hospitals. This can allow researchers to track and analyse how complex microbial communities change over time, and spot the spread of pathogens in healthcare settings.

"A soil sample can contain around 50,000 different species of bacteria," said Dr Christopher Quince, corresponding author and Group Leader at Earlham Institute and Quadram Institute, who led the study alongside Drs Rayan Chikhi and Gaëtan Benoit from the Institut Pasteur.

"In metagenomics, we're trying to study them all simultaneously by sequencing all the DNA in a sample, then using software algorithms to split that out into the genomes of the different organisms," said Chris. "This doesn't just reveal which microbes are present, it allows us to measure and track their relative abundance and even predict their function."

Underpinning the advances in metagenomics has been the advent of DNA sequencing technologies that can accurately analyse long stretches of DNA in a single go – known as 'long-read' sequencing.

The long-read sequencing market is currently dominated by two manufacturers: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies. PacBio sequencers are generally held to be more accurate, but more expensive and computationally resource- hungry. Nanopore sequencers have traditionally been more error-prone, but cheaper. The relative strengths and weaknesses of each platform mean researchers can choose which to use based on the requirements of their individual projects.

"Nanopore sequencers can be much more portable – I know researchers who've used them to run a metagenome analysis on a laptop in a hotel room," said Chris.

But until recently, Nanopore data was much noisier, with an error rate of around 5%. This is an order of magnitude less accurate than PacBio, said Chris, "and that made it really difficult to reconstruct microbial genomes algorithmically."

For this reason, the research team focused their efforts on developing assemblers tailored to PacBio sequence data. In 2024, they released metaMDBG , an assembler that set a new benchmark for efficiency and accuracy. It was a significant leap forward for the field, proving to be 12 times faster than other software, and yielding more accurate genome reconstructions. However, it underperformed on 'noisier' Nanopore sequence data, restricting its use.

Recently, however, new chemistry techniques have led to a substantial improvement in the accuracy of Nanopore machines to around 1%.

"That allowed us to start to think about whether we could feasibly add an error-correction step to metaMDBG, and allow us to adapt it to work with data from Nanopore machines," said Chris.

The team's new software, nanoMDBG, is an adaptation of metaMDBG.

In a paper published in Nature Communications, the researchers used nanoMDBG to analyse a range of DNA samples, including a complex 400 Gbp soil sample. The team showed that it was more accurate than the current leading Nanopore-based assembler, and yielded results comparable to those from PacBio sequencing data.

"We're pleased to see nanoMDBG deliver such robust results compared to similar tools. One of its core strengths is its scalability through efficient memory usage, enabling it to process extremely large datasets in a single run," said Dr Gaëtan Benoit, first author on the study, and Research Fellow at Institut Pasteur.

"Our new assembler is far more scalable in terms of the computational resources it uses," added Chris. "We can now, say, assemble a gut microbiome on a laptop in a few hours. It's a big leap forward.

"Making these sorts of analyses more accessible will allow us to use metagenomics to explore all sorts of unknown biology," he said. "For example, we know that agriculture is responsible for about 12% of greenhouse gas emissions in the UK , and maybe 30% of this is nitrous oxide released by soil microbes.

But we have no idea which microbes are responsible, because we can't grow them in the lab. If we can identify them through metagenomics, we might better understand how they're generating these greenhouse gases and start to try and tackle that problem," Chris concluded.

The study 'High-quality metagenome assembly from nanopore reads with nanoMDBG' is published in Nature Communications and was supported by the BBSRC-funded Decoding Biodiversity research programme at the Earlham Institute.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.