Proteins are life's molecular workhorses, doing everything from turning sunlight into food to fighting viruses. They are built from 20 different types of amino acid molecules, so even a small protein made of 60 amino acids in length can, in theory, be constructed in a quinquavigintillion, or 10⁷⁸, different ways. That's about as many atoms there are in the entire universe.
How did evolution choose the handful of amino acid combinations that result in proteins which fold, stay stable and get the job done? And can we learn these rules to help protein engineers design better medicines and greener catalysts? A study published today in the journal Science has taken an important step toward answering both questions.
Proteins have a core that keeps the structure from collapsing, while the surface does most of the work, such as binding with other molecules. For decades, biologists assumed that altering the core was like removing a load-bearing wall: one wrong move and the whole structure collapses. Because buried amino acids are packed tightly, it seemed logical that any alteration can force neighbouring amino acids to shift, resulting in unpredictable domino effects that ripple throughout the protein.
With this classical picture of protein stability, most changes to the building blocks of a protein would set off hidden booby traps and threaten to knock the entire structure out of shape. Given the sheer number of combinations possible, the odds of evolution stumbling onto a safe route to create new proteins seems very small.
The study turns this idea on its head. Researchers at the Centre for Genomic Regulation (CRG) in Barcelona and the Wellcome Sanger Institute in Hinxton, UK, studied a human protein domain (the functional bit of a protein) called FYN-SH3, making hundreds of thousands of variants and testing which ones still folded and worked.
The experiments revealed that SH3 retained its shape and function across thousands of different core and surface combinations. Only a few true, load-bearing amino acids existed in the protein's core.
"Our data challenges the dogma of proteins being a delicate house of cards. The physical rules governing their stability is more like Lego than Jenga, where a change to one brick threatening to bring the entire structure down is a rare, and crucially, predictable phenomenon," explains Dr. Albert Escobedo, first author of the study and postdoctoral researcher at the Centre for Genomic Regulation.
The team used the large amount of data generated by their experiments to test whether learning the rules from one protein could help explain the evolution of all related proteins that exist in Nature. They fed the data into a machine-learning algorithm, which helped them create a tool that can predict whether an SH3 sequence will stay stable.
SH3 domains have been diversifying since early multicellular life, roughly one billion years ago. The researchers compared their model against 51,159 natural SH3 sequences found in public databases spanning the entire tree of life, including bacteria, plants, insects and humans. The algorithm correctly flagged almost all SH3 domains as stable, even when a test sequence shared less than a quarter of the sequence with the human version.
"Evolution didn't have to sift through an entire universe of sequences. Instead, the biochemical laws of folding create a vast, forgiving landscape for natural selection," says Dr. Escobedo.
Implications for protein engineering
The field of protein engineering currently relies on companies screening thousands of protein variants with minimal changes, inching forward a few changes at a time and making the design of new enzymes, drugs and vaccines slow and expensive.
The confirmation that protein stability follows simpler rules than previously thought can slash the trial-and-error phase for protein design, saving significant time and effort for developing proteins with medical or industrial applications, such as greener catalysts or longer-lasting medicines.
For example, therapeutic enzymes often fail because their surfaces trigger immune flare-ups. Resurfacing these proteins is labour intensive, requiring lots of trial and error to avoid the scaffold from collapsing and sabotaging a promising design. Now, protein engineers can propose bolder designs, including dozens of simultaneous changes, on computers and walk into the lab already knowing which variants are most likely to survive both folding and functional tests.
"The ability to predict and model protein evolution opens the door to designing biology at industrial speed, challenging the conservative pacing of protein engineering," explains ICREA Research Professor Ben Lehner, corresponding author of the study with dual affiliation at the Centre for Genomic Regulation (CRG) and the Wellcome Sanger Institute.