Rapid Evolution of Complex Multi-Mutant Proteins

Arc Institute

The search space for protein engineering grows exponentially with complexity. A protein of just 100 amino acids has 20^100 possible variants—more combinations than atoms in the observable universe. Traditional engineering methods might test hundreds of variants but limit exploration to narrow regions of the sequence space. Recent machine learning approaches enable broader searches through computational screening; however, these approaches still require tens of thousands of measurements or 5-10 iterative rounds.

With the advent of these foundational protein models, the bottleneck for protein engineering swings back to the lab: for a single protein engineering campaign, we can only efficiently build and test hundreds of variants. What is the best way to choose those hundreds to most effectively uncover an evolved protein with substantially increased function? To address this problem, we developed MULTI-evolve, a framework for efficient protein evolution that applies machine learning models trained on datasets of ~200 variants focused specifically on pairs of function-enhancing mutations.

Published today in Science , this work represents Arc Institute's first lab-in-the-loop framework for biological design, where computational prediction and experimental design are tightly integrated from the outset, reflecting our broader investment in AI-guided research.

Learning from pairwise interactions

Evolving proteins involves two fundamental steps: finding beneficial mutations, then combining them synergistically. Early in developing this approach, we realized that neural networks trained on single-mutant data alone couldn't reliably predict which multi-mutant combinations would work. Those models lack information about how mutations interact and most large datasets of random variants aren't useful because the vast majority of mutations don't enhance function, so testing thousands of random variants teaches models mostly about what doesn't work.

Our insight was to focus on quality over quantity. First identify ~15-20 function-enhancing mutations (using protein language models or experimental screens), then systematically test all pairwise combinations of those beneficial mutations. This generates ~100-200 measurements, and every one is informative for learning beneficial epistatic interactions.

We validated this computationally using 12 existing protein datasets from published studies. Training neural networks on only the single and double mutants, we found models could accurately predict complex multi-mutants (variants with 3-12 mutations) across all 12 diverse protein families. This result held even when we reduced training data to just 10% of what was available.

Training on double mutants works because they reveal epistasis. A double mutant might perform better than the sum of its parts (synergy), worse than expected (antagonism), or exactly as predicted (additivity). These pairwise interaction patterns teach models the rules for how mutations combine, enabling extrapolation to predict which 5-, 6-, or 7- mutation combinations will work synergistically.

We then applied MULTI-evolve to three new proteins: APEX (up to 256-fold improvement over wild-type, 4.8-fold beyond already-optimized APEX2), dCasRx for trans-splicing (up to 9.8-fold improvement), and an anti-CD122 antibody (2.7-fold binding improvement to 1.0 nM, 6.5-fold expression increase). For dCasRx, we started with a deep mutational scan of >11,000 variants, extracted only the function-enhancing mutations, and tested their pairwise combinations—demonstrating the value of strategic data curation for efficient engineering.

Each required experimentally testing only ~100-200 variants in a single round to train models that accurately predicted complex multi-mutants, compressing what traditionally takes 5-10 iterative cycles over many months into weeks.

The MULTI-evolve loop

MULTI-evolve integrates three innovations into an end-to-end framework.

1. Combining protein language models enables effective mutation discovery

While single mutations can improve protein function, substantial improvements in function require combining several mutations. Previous work has demonstrated the ability of protein language model zero-shot methods to predict which mutations might improve function, but any individual method identifies few mutations for generating higher-order combinatorial variants.

To identify many function-enhancing mutations, our solution was to combine predictions from several different models, some analyzing protein sequence, others 3D structure, with two scoring methods. Testing this across 73 diverse protein datasets, we found our approach identified ~20 beneficial mutations on average, compared to ~11 from any single model.

When we applied this to APEX, we identified the A134P mutation, which improves activity 53-fold. Standard protein language model-based methods systematically missed it because they penalize proline substitutions. One of our ensemble scoring strategies involves normalizing amino acid specific biases, like this bias against proline substitutions, allowing A134P to emerge as a candidate when it otherwise would have been overlooked.

2. Neural networks predict which combinations will work best

Our next step was to determine, with a set of beneficial single and the pairwise double mutants, what is the most effective way to combine them into multi-mutant variants with up to 7 mutations.

Through computational benchmarking, we demonstrate that fully connected neural networks can reliably predict the activity of multi-mutants by training on primarily single and double mutants. Across 12 diverse protein datasets, our models correctly identified top performers more than half the time.

In practice, we demonstrate that MULTI-evolve can identify hyperactive variants with up to 7 mutations across 3 distinct proteins. We engineer multi-mutant variants with a single round of machine learning, where models are trained on a compact training set of ~200 strategic variants, and we experimentally test as few as 9 proposed candidates.

3. The MULTI-assembly method enables rapid synthesis

Another bottleneck is building and testing predicted variants. Commercial DNA synthesis is expensive and slow, especially for complex multi-mutants. Existing lab methods for multi-site mutagenesis have low efficiency and subjective oligo design that can make results unreliable.

To address this, we developed MULTI-assembly, a multi-site mutagenesis method that builds complex variants efficiently. By systematically optimizing reaction conditions, oligonucleotide designs, and assembly parameters, we achieved 40-70% assembly efficiency for variants with up to 9 mutations across several kilobases. We also developed a computational oligo designer that takes your target mutations as input and outputs primers optimized for efficient assembly. All of this can be done in days rather than weeks.

Try MULTI-evolve yourself

The MULTI-evolve framework is modular and will improve as the field advances. Better protein language models will enhance mutation discovery, and the approach integrates naturally with other design tools, refining computationally designed proteins or optimizing therapeutic candidates.

We've made MULTI-evolve available as an open-source tool that handles protein language model predictions, neural network training, and MULTI-assembly oligo design. Whether you're working on enzymes, genome editors, or therapeutic proteins, the framework provides a systematic path from initial mutations to optimized multi-mutants.

We're excited to see how the community applies MULTI-evolve to their protein engineering challenges. If you have questions about applying this to your work, please reach out.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.