As global temperatures rise, Cold Spring Harbor Laboratory (CSHL) scientists work to grow stronger, more resilient crops. Yet, this process is challenging. Plants often have several related genes that control desirable traits, such as size or drought resistance. Finding genes with overlapping functions, or "redundant genes," is a near-impossible scavenger hunt.
"Most of the time, there are major limitations in the pathway to crop improvement," said Iacopo Gentile , a postdoc in CSHL's Zachary Lippman lab. "That's because there's so much redundancy and complexity in how gene families evolve and compensate for each other."
Now, Gentile and colleagues have traced one important gene family in flowering plants to see how it's changed over 140 million years of evolution. Using this data, they trained models to identify patterns of redundancy and predict which genes to edit for modifying specific traits.
"It's about understanding what happens after gene duplication," Gentile explained. "You have one gene that duplicates. Then you have two. What happens after that? Theory tells you they will diverge from each other. The big question mark in the field is how."
To answer this question, the team homed in on CLE , a gene family involved in cell signaling and plant development. CLE peptides are prevalent across all plant species. However, much about their specific functions remains unknown. Studying them has been difficult due to their short length, rapid evolution, and redundancy.
Using new advances in AI, the team identified thousands of previously unknown CLE genes across 1,000 species. They fed this data into computer models, which flagged genes that might be redundant. Redundant genes likely share similarities in one or two places—the peptides they produce, or gene promoters, the areas of DNA that control expression.
To confirm the models' predictions, the Lippman lab knocked out the flagged genes in tomatoes, using CRISPR. As suspected, eliminating just one redundancy had no effect. However, knocking them all out produced visible changes in the plants.
"It's the first time in tomatoes where you have such big targeting of so many genes at the same time," Gentile said. "We targeted 10."
Notably, the team discovered that most redundant genes had similar promoters even if peptide sequences differed. The model not only identified possible redundancies—it also predicted whether specific CLE mutations would have positive, negative, or neutral effects on plants.
Gentile said the method they developed could "easily be scaled to every gene family," not just CLE. As a result, plant breeders now have a "roadmap" to predict how hidden genes could be used to their advantage.