AI Cracks Gene Code: Insights Into Plant Control Center

Forschungszentrum Juelich

16 June 2026

An international research team led by Forschungszentrum Jülich and the IPK Leibniz Institute has developed an artificial intelligence (AI) model that predicts where regulatory proteins dock onto plant DNA to switch genes on and off. Trained entirely on the rich genomic data available for the model plant Arabidopsis thaliana, the model transfers successfully to crops such as maize - opening new ways to understand how genetic variation shapes crop performance. The study was recently published in Nature Communications.

When people talk about the genome, many think of genes. But genes alone do not explain why plants grow differently or react to environmental stimuli. In fact, DNA also contains many sections that act like switches or regulators. One particularly important of these regulatory elements are so-called transcription factors. These bind to the DNA and determine, among other things, when a gene becomes active and how strongly it works.

A useful picture is a house: the genes are the rooms, while the regulatory regions are the light switches, thermostats, and fuse boxes. To understand how the house works, you need to know not just the rooms but the wiring behind the walls. The IPK team set out to map that wiring using massive data resources available for the "lab rat" of plant science - Arabidopsis.

To do so, the researchers trained a deep learning model on hundreds of experimental DNA-binding datasets, teaching it to recognise the binding patterns of 46 transcription factor families at once. This "multi-label" design is a departure from earlier approaches, which typically built a separate model for each individual factor and scaled poorly across a genome. The team then tested whether the model could correctly locate binding sites it had not been shown and uncover new regulatory relationships.

"Our results show that transcription factors don't simply read isolated DNA motifs. What matters is the surrounding sequence and the way these signals are arranged together," says Fritz Forbang Peleke, first author of the study. The analogy is language: individual words carry little meaning until their order and context form a sentence. In DNA, too, function emerges from how regulatory elements combine - a kind of regulatory grammar - rather than from single building blocks alone.

Using these predicted binding patterns, the model sorted Arabidopsis genes into groups based on how they are likely to be regulated. Strikingly, thousands of genes fell into just 14 broad regulatory clusters, several of which lined up with shared biological functions and coordinated gene activity. "Plants carry thousands of genes, yet many of their functions appear to arise from a surprisingly small set of recurring regulatory patterns," Peleke says.

The large experimental datasets available for model plants make it possible to train AI algorithms that also perform well on crops. In our study, we demonstrate this by training a gene regulation model on Arabidopsis data and using it to predict transcription factor (TF) binding in maize in response to heat treatment.
Copyright:
- Created with AI / Dr. Jędrzej Szymański

The team also examined more than 7,000 DNA variants previously linked through genome-wide studies to traits such as flowering time, disease resistance, and seedling growth. About one in five of these variants was predicted to shift transcription factor binding. "We can now estimate how a single change in a regulatory stretch of DNA alters gene activity and, in turn, an important plant trait," explains Dr. Jędrzej Szymański, head of the Network Analysis and Modelling research group at the IPK and the Omics Data research group at the Forschungszentrum Jülich. "This gives researchers a way to move from a statistical association to a plausible molecular mechanism."

One flowering-time example proved especially telling. The model predicted that a single base change in a regulatory region would simultaneously affect the binding of several transcription factors - the kind of change that can nudge a plant to flower earlier or later. The prediction was then confirmed experimentally using a high-throughput reporter assay.

Although trained only on Arabidopsis, the model could be applied to the distantly related crop maize, where it helped annotate which transcription factors respond under heat stress. Known heat-stress regulators, including heat shock factors, stood out as particularly important - illustrating how the approach could support crop research in species where binding data remain scarce.

Original Publication

Peleke et al. (2026): Genome-wide modelling of plant transcription factor binding captures regulatory variants associated with phenotypic traits. Nature Communications. DOI: 10.1038/s41467-026-73634-8

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.