Plant Molecules Offer Rich Data Set for Study


Researchers at the University of Geneva have established a searchable library of spectra and molecules found in a collection of 1,600 plant extracts. This collection was accessed through a collaboration with Pierre Fabre Laboratories. The resulting open resource, published in the journal GigaScience, shares both the obtained data and the employed methods. This will be useful for research ranging from drug discovery to the large-scale exploration of plants’ chemical diversity.

Plant metabolites play a fundamental role in drug development, as they can display potent biological activities; many existing drugs are used either as pure natural products (for example the anti-malarial artemisinin and anti-cancer drug taxol), or derived from natural products (such as the anti-cancer drugs vinorelbine, brentuximab vedotin).

Pierre Fabre Laboratories have always been keen to share its pharmaceutical experience with academia. For instance, the anti-cancerous semi-synthetic natural product vinorelbine, which is derived from the Madagascar Periwinkle (Catharanthus roseus), was marketed in 1989 as a result of a collaboration with Professor Pierre Potier from the French National Center for Scientific Research (CNRS). Pierre Fabre Laboratories have made plant research a central aspect of its approach since its creation. A collection of plant samples was constituted over the 1998-2015 period with the main goal of finding novel anti-cancer drugs. This collection of botanical samples is one of the largest private plant libraries in the world numbering over 17,000 unique samples, including some rare species and covers a diverse range of botanical families from all over the world. Since 2015, Pierre Fabre Laboratories have opened access to their private plant samples collection for interested partners.

This new study published in GigaScience reports the chemical characterization of circa 10% of the plant extracts in the Pierre Fabre Laboratories collection. This represents an important step towards making the chemical diversity of the full collection accessible to researchers around the world. The researchers at the University of Geneva used high-resolution mass spectrometry in combination with advanced computational pipelines to acquire over two million spectra and associated chemical information, providing valuable insights into the biochemical content of the plant extracts. The mass spectrometry profiles and associated metadata have been shared openly through the MassIVE repository (accession number MSV000087728), and this Data Note provides demonstrations on how to query this extensive curated resource. The Data Note shares both the resulting data and the employed methods, allowing for reproducibility, further exploration of the dataset and improvement of the proposed computational tools and methods. This is an exceptional resource for advancing the field of large-scale chemodiversity digitization. Such fruitful partnership between academia and industry illustrates that the fate of a historical and private collection of plant samples can be changed and that the richness of the associated chemical diversity can be made available to a wider public.

Further Reading:

Allard P-M et al. (2023): Open and re-usable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts. GigaScience doi:10.1093/gigascience/giac124


Data Availability

Allard P-M & Wolfender, J.-L. (2021). MassIVE MSV000087728 – GNPS_PF_plant_extracts_library_dataset_01 [Data set]. MassIVE.

Allard P-M; Gaudry A; Quirós-Guerrero L; Rutz A; Dounoue-Kubo M; N Walker TW; Defossez E; Long C; Grondin A; David B; Wolfender J (2022): Supporting data for “Open and re-usable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts.” GigaScience Database.

Funding: Swiss National Science Foundation (SNFN◦CRSII5_189 921/1)

Sharing on social media?

Find GigaScience online on twitter @GigaScience; Facebook, and keep up-to-date with our blog

Sharing on social media?

Find GigaScience online on twitter @GigaScience; Facebook, and keep up-to-date with our blog

About GigaScience Press

GigaScience Press is BGI’s Open Access Publishing division, which publishes scientific journals and data. Its publishing projects are carried out with international publishing partners and infrastructure providers, including Oxford University Press and River Valley Technologies. It currently publishes two award-winning data-centric journals: its premier journal GigaScience (launched in 2012), which won the 2018 American Publishers PROSE award for innovation in journal publishing, and its new journal GigaByte (launched 2020), which won the 2022 ALPSP Award for Innovation in Publishing. The press also publishes data, software, and other research objects via its database. To encourage transparent reporting of scientific research and to enable future access and analyses, it is a requirement of manuscript submission to all GigaScience Press journals that all supporting data and source code be made openly available in GigaDB or in a community approved, publicly available repository.

About GigaScience

GigaScience is co-published by GigaScience Press and Oxford University Press. Winner of the 2018 PROSE award for Innovation in Journal Publishing (Multidisciplinary), the journal covers research that uses or produces ‘big data’ from the full spectrum of the biological and biomedical sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life and medical sciences. The journal has a completely novel publication format — one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB, as well as in publicly available repositories. GigaScience will provide users access to associated online tools and workflows, and has integrated a data analysis platform, maximizing the potential utility and re-use of data.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.