Boost in Protein Study with New Bioinformatics Tools

IBB researchers have generated a high-quality dataset to help better understand the proteins involved in a key biological process through which they form condensates, and the role they play in functional, ageing and disease conditions. The resources generated, available on an open and online platform, will improve current predictive models which have significant shortcomings.

Grup de recerca Plegament de Proteïnes i Malalties Conformacionals de l'IBB-UAB
Members of the Protein Folding and Conformational Diseases research group at the IBB-UAB, led by Salvador Ventura.

Many proteins are capable of spontaneously rearranging themselves within cells to form molecular condensates—membraneless intracellular structures formed by one or multiple proteins—through a process known as liquid-liquid phase separation (LLPS). This biological process is key, as it allows proteins to organize, interact and function in an efficient and regulated manner within the cellular environment. When this mechanism fails, neurodegenerative diseases, cancers or developmental disorders can appear.

A research team from the Institute of Biotechnology and Biomedicine (IBB) of the UAB has now created the most comprehensive and reliable dataset of proteins participating in LLPS. Their proposal offers a protocol that allows to overcome the limitations of the algorithms developed so far to obtain predictive models, in which they identified shortcomings that prevent a joint and accurate analysis of the data.

The study, published in the journal Genome Biology, was led by Salvador Ventura, professor of the Department of Biochemistry and Molecular Biology of the UAB and director of the Parc Taulí Research and Innovation Institute (I3PT-CERCA); Michał Burdukiewicz, Maria Zambrano researcher at the IBB and head of the bioinformatics group at the Medical University of Białystok (Poland); and Carlos Pintado Grima, researcher at the IBB and first author of the study.

The research team classified precisely the two main types of proteins involved in LLPS, those that can form condensates by themselves (drivers) and those that only form part of them (clients). In addition, they developed the first standard set of proteins that do not participate in this process, which includes both proteins with defined structures and disordered proteins, "a key element for training artificial intelligence systems fairly and efficiently," says Salvador Ventura, who also coordinates the Protein Folding and Conformational Diseases research group at the IBB.

To validate their work, the scientists investigated specific physicochemical traits involved in LLPS in different subsets of protein sequences, identifying significant differences among them. Moreover, they evaluated the prediction of LLPS in sixteen existing bioinformatics tools, which is the most comprehensive comparison made so far.

The dataset generated in the study allows associating the role of a given protein in LLPS accurately. In total, the researchers classified 2,876 different proteins. "The data we have generated was created to guarantee reliability and interoperability among them, based on standardized criteria for their selection and categorization. Until now, we did not have enough reliable data to make rigorous predictions. With this new resource, we open the door to the development of new, more precise computational tools," says Salvador Ventura.

The datasets and all associated tools of the study are openly available in llpsdatasets.ppmclab.com.

Article: Carlos Pintado-Grima, Oriol Bárcenas, Eva Arribas-Ruiz, Valentín Iglesias, Michał Burdukiewicz, Salvador Ventura. Comprehensive protein datasets and benchmarking for liquid–liquid phase separation studies. Genome Biology, 26, 198 (2025). https://doi.org/10.1186/s13059-025-03668-6

/UAB Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.