Roadmap Unveiled for Safer, Transparent Protein AI

Center for Genomic Regulation

Protein language models are artificial intelligence tools which help engineer proteins with useful properties, including completely new structures never seen before in nature.

The technology has huge potential for addressing global challenges, such as synthesising enzymes that can absorb carbon dioxide from the atmosphere or building catalysts that greatly reduce energy use or toxic waste byproducts in industrial processes.

As many of these models begin to shape real-world decisions in biotechnology, a major problem persists. Protein language models (pLMs) largely operate as black boxes, making it hard to understand their decision process and judge whether their predictions are reliable, biased, or even safe to apply in the real world.

In a new perspective paper published today in Nature Machine Intelligence , researchers at the Centre for Genomic Regulation (CRG) analyse how "explainable AI", the techniques and methods which allow humans to understand, trust and interpret the decisions of the technology, are currently applied to protein language models.

"Protein language models are moving fast but our understanding of fundamental biological processes such as folding or catalysis has not advanced alongside these breakthroughs," says Dr. Noelia Ferruz, Group Leader at the CRG and corresponding author of the paper.

"In some ways, we have even lost part of the transparency that characterized physics-based models. Without better ways to explain what these models learn and how they make decisions, we risk building powerful tools that we cannot fully trust," adds Dr. Ferruz.

The authors also issue a call to action for the research community to make protein-design systems more transparent, trustworthy and secure. "If we want protein language models to become a reliable partner in discovery and design, explainability must not be an afterthought," says Andrea Hunklinger, first author of the paper.

Four places to look when trying to explain a pLM's decisionmaking

The authors write that if you want to understand why an AI model has made a predictive decision about what type of structure or properties a protein has, you first need to ask where the explanation is coming from.

They identify four key places along the model's journey that are critical for being able to explain its decision making. The first is what training data the model learned from, which, for example, can explain whether the model has biases that don't account for human genetic diversity, or whether it has enough data of human proteins in the first place.

The second is the specific protein sequence given to the model. For example, in a housing price prediction model, features might include square metres, number of bedrooms, or location. In the context of protein language models, it is which amino acids or regions of the protein influenced the prediction the most.

The third is the architecture and internal components of the protein language model itself, comparable to opening the hood of a vehicle and checking its engine. For protein language models, that involves checking whether the artificial neurons used by the AI are processing information correctly.

Finally, researchers can probe a protein language model by nudging it and watching what happens. This is called input-output behaviour and involves studying how the model's answer changes if you slightly alter the protein sequence or the question you ask.

What do scientists try to achieve when opening the "black box"?

To understand how explainable artificial intelligence is being used in protein research today, the researchers reviewed existing scientific literature and examined dozens of studies where explainability tools have been already applied to protein language models. It is the most comprehensive survey of its kind to date.

The authors organised the scattered body of work into a clear set of roles that explainability can play in protein research, helping turn a technically dense field into something far more approachable.

In almost all cases, explainability is used as an "Evaluator", a way to check whether a model has learned patterns that biologists already know, such as recognising binding sites or structural motifs.

"While Evaluators are useful to benchmark the model's quality, they do not allow to extrapolate to unknown examples, improve the models' architecture, and more importantly, reveal biological insights that emerge from the training data", says Hunklinger.

A smaller share of studies go a step further, using these insights as a "Multitasker", reapplying learned signals to help annotate new proteins or predict additional properties. The authors note that these two roles dominate the field today, showing that explainability is largely being used as a verification and support tool rather than a driver of discovery.

The researchers found that a limited number of studies are using explainable AI insights as an "Engineer" or a "Coach", helping them trim unnecessary components and redesign architectures to steer the technology so it can generate protein sequences toward desired traits.

Towards a "Teacher" protein language model

The fifth role for explainable AI in protein language is the "Teacher", which stands apart as the most ambitious and the least realised. This type of explainable AI can help reveal entirely new biological principles that humans had not previously recognised.

The authors of the paper compare reaching this milestone seen in other areas of artificial intelligence, such as when AlphaZero began uncovering novel chess strategies that surprised grandmasters, or when AI systems helped decipher damaged ancient texts by recognising linguistic patterns invisible to the human eye. This is when the technology shifted from being a tool of efficiency to one that provides new insight.

In protein science, reaching the teacher stage would mean AI systems helping researchers uncover new rules of protein folding, catalysis or molecular interaction that could transform how medicines, materials and sustainable technologies are designed.

"For us, the real holy grail is controllable protein design. Imagine being able to tell a model: 'Design a protein with this shape, active at this pH,' and not only receive a candidate sequence, but also a clear explanation of why that design should work, and importantly, why alternatives would fail," explains Dr. Ferruz.

"For example, the model could explain that a particular mutation would disrupt a hydrogen-bonding network essential for stability. Reaching that level of control and mechanistic transparency would move protein language models from impressive generators to truly reliable design partners," she adds.

The authors stress that reaching Teacher status for protein language models will not happen automatically. Today's models are powerful pattern recognisers, yet they often rely on statistical correlations rather than true understanding. The authors argue that several conditions must be met, with their core concern being reliability and validation-

The paper calls for the community to create robust benchmarks and evaluation frameworks to test whether an explanation genuinely reflects the model's reasoning. They also call for open-source tooling that makes explainability accessible and comparable across labs. Most crucially, any AI-derived insight must ultimately be validated in the laboratory, turning mathematical patterns into experimentally confirmed biological knowledge.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.