500 Genome Assemblies Released for Darwin Tree of Life

We look back through some of the 2022 highlights from the Darwin Tree of Life project

A group of people wearing Darwin Tree of Life t-shirts
The Darwin Tree of Life project in 2022. Photo credit: Luke Lythgoe, Wellcome Sanger Institute

As of December 2022, the Darwin Tree of Life (DToL) project - an ambitious project to sequence all species in the UK and Ireland - has released 500 reference-quality genome assemblies to public databases, ready to be used by researchers around the globe. Researchers at EMBL-EBI are supporting the DToL project by storing and annotating the genomes sequenced, and making these data openly available through Ensembl Rapid Release and the DToL Data Portal.

The successes of this last year has been down to the tireless DToL team - whether in the field or the lab, on computers or taking the DToL science on the road. Below is a selection of highlights chosen by some DToL partners.

Annotations, data portal features and geocaching

EMBL's European Bioinformatics Institute (EMBL-EBI)

Three cards with pictures of different species
'Into Nature' super species cards. Credit: Briony Jackson, EMBL-EBI

In 2022, the EMBL-EBI DToL team reached the first 100 annotated genomes for new species. This number has steadily increased and these new genome annotations are openly available through the Ensembl DToL page and Ensembl Rapid Release.

The team also made some exciting updates to the DToL Data Portal - an open access platform managed by EMBL-EBI, which pulls together data from across the DToL project making it available all in one place. Users can track the sequencing progress of their species of interest. This feature was updated to include detailed status updates to monitor samples at each step of the process. They also added an interactive sampling map that allows users to identify where species samples have been collected.

On the public engagement side of things, the team launched a new multilingual activity to engage migrant communities with the science underpinning nature in local areas. This activity - called 'Into Nature' - combines geocaching with collectable cards that participants can track down to learn more about DToL species.

First 500 assemblies

The Wellcome Sanger Institute

Coloured blocks with icons of different species
The diversity of the first 500 Darwin Tree of Life genome assemblies released to public databases. Main colour blocks represent 'kingdoms' (animals, plants, fungi, protists); colour shades show different phyla; icons show the orders within each phylum. Credit: Wellcome Sanger Institute

For everyone at the Wellcome Sanger Institute's Tree of Life Programme, 2022 will be looked back on as the year its genome production pipeline really powered into action. This initial stage of the DToL project has given proof of concept for this ambitious genomics venture.

Could the team create a series of scientific processes that take an organism in the wild and transform it into a top-quality, chromosomal-level genome assembly representing its entire species? Could those processes be repeated again and again for thousands of species spanning every branch of the tree of life? Could a network of partner organisations focused on ecology, informatics, and analysis be brought together to achieve this goal?

The answer is a resounding yes. In the last 12 months, the team has more than doubled the number of DToL genome assemblies on public databases and tripled its catalogue of published Genome Notes.

The graphic above shows the diversity of the project's first 500 assemblies across the tree of life. There has undoubtedly been a bias towards arthropods, for a number of reasons. They are easy to collect and sequence - for example, moths fly towards light, arthropod DNA is relatively easy to extract in the lab, and many have smaller genomes. But the team also wanted to showcase the project's potential for aiding comparative genomics, with DToL scientists already publishing arthropod studies based on these genome assemblies.

There is also a significant breadth of diversity emerging. In 2022, the team published Genome Notes for the first fungi, cnidarians, tunicates, molluscs, and most recently, plants. The first protists are due to be published soon. Getting to grips with this dazzling array of organisms and their genomes is a key achievement in a very successful year.

The first plant genomes are published

Royal Botanical Garden Edinburgh (RBGE)

The European crab apple (Malus sylvestris) along the West Highland Way from which RBGE's Markus Ruhsam collected DToL's sample. Credit: Markus Ruhsam, RBGE

You wait for ages to publish some reference genomes for plants, then five come at once. This team sequenced Britain and Ireland's only native wild apple tree (Malus sylvestris) plus four heritage cultivars of Malus domestica originally grown on these shores. This is just part of a deeper apple-based project botanists at Edinburgh and Kew, bioinformaticians at the Wellcome Sanger Institute, and other collaborators have been involved with since DToL began. They've also produced short-read DNA sequences to compare over 40 other varieties of locally grown apples.

These genomes can help answer questions about UK's apple history, the species' evolution, and how to protect the precariously positioned crab apple. Its genetic integrity is being undermined by hybridisation with widely-planted domestic relatives. Nearly 30% of the wild apple trees surveyed in a recent study in northern Britain turned out to be of hybrid origin.

Learn more about the DToL apple genomes here. And if you fancy mulling over some genomics with a hot cider this winter, check out the short Scider videos researchers made looking at the science of this apple-based beverage.

Mistletoe: Britain and Ireland's largest genome

Royal Botanical Gardens Kew

Mistletoe
Mistletoe (Viscum album) growing on the Wellcome Genome Campus wetlands nature reserve. Credit: Wellcome Sanger Institute

Possibly the species which has taken up the most DToL time in 2022 is the European mistletoe. Viscum album boasts the largest genome in Britain and Ireland, clocking in at 90 Gbp - 30 times the size of the human genome. Why is it so large? Nobody is quite sure, but that's a question which the reference genome will help answer in future.

Samples from a female mistletoe were collected by the Kew Gardens team in September 2020 in southwest London. The next year, 2021, focused on extracting the plant's DNA and sequencing its DNA data. In 2022, bioinformaticians assembled the genome along ten colossal chromosomes. A special mention goes to Lucia Campos-Dominguez at the University of Edinburgh, who spent the last three months curating the genome - essentially scrolling through, chromosome by chromosome, checking for errors and inversions.

Barring some checking of the work and a bit of head-scratching over how to upload this massive genome onto public databases, the team can declare that a reference genome assembly of Britain and Ireland's biggest genome is now complete.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.