LLNL Unveils Record Protein-Folding on Fastest Supercomputer

Courtesy of LLNL

Scientists at Lawrence Livermore National Laboratory (LLNL) and collaborators at Advanced Micro Devices (AMD) and Columbia University have achieved a milestone in biological computing: completing the largest and fastest protein structure prediction workflow ever run, using the full power of El Capitan, the world's fastest supercomputer. El Capitan is funded by the Advanced Simulation and Computing program at the National Nuclear Security Administration (NNSA).

The effort, dubbed ElMerFold, produced high-quality 3D structure predictions for more than 41 million proteins - at a scale and speed previously thought impossible. The record-breaking run hit a sustained rate of 2,400 structures per second and peaked at 604 petaflops of performance using 43,200 AMD Instinct MI300A Accelerated Processing Units (APUs) and 10,800 nodes of El Capitan.

"Doing this kind of AI inference at scale and being able to take a trained model and generate lots of data for distillation - or to explore the design space for bioresilience - is absolutely critical as a capability," said LLNL computer scientist Brian Van Essen, the project's principal investigator. "And to be able to do it on El Capitan is game-changing, because we can reach scales that you just can't get anywhere else."

This work, performed prior to El Capitan being dedicated to national-security work, sets a new standard for large-scale scientific AI and enables critical next steps in the development of OpenFold3, an open-source alternative to DeepMind's AlphaFold 3. The 2024 Nobel Prize in Chemistry was awarded to the developers of AlphaFold, highlighting the societal importance of computational protein structure prediction. The OpenFold models are publicly available and openly licensed, making them an essential tool for democratizing access to cutting-edge protein-structure prediction.

The work also aligns with goals set forth in America's AI Action Plan, which calls for investments in biosecurity and AI‑enabled science built through partnerships among the Department of Energy (DOE) national laboratories, industry, federal agencies and research institutions and encourages open-source AI model development.

Unlocking the secrets of protein folding

Predicting how a protein folds into its 3D shape is a fundamental challenge in biology, crucial to understanding disease, designing drugs and creating novel therapeutics. Deep learning models like AlphaFold have revolutionized this field, but their licensing restrictions limit broader scientific use. Experimental methods remain expensive and slow - often taking months or years.

"The model DeepMind produced is mostly closed, with limited access not just for academics and industry, but in national defense," Van Essen said. "OpenFold aims to meet or exceed AlphaFold's capabilities but do so in a way that allows us to apply the model for national security and enables broader community engagement."

ElMerFold tackles this challenge head-on by generating a massive protein distillation dataset: using AI to simulate protein structures from sequences, then using those predictions to train more advanced models. This stage has traditionally been a bottleneck, requiring more compute than training itself. Creating the workflow involved complicated preprocessing steps like sequence alignment filtering, template generation and molecular dynamics relaxation after inference.

"It's a more complex model, and there's a lot of complex orchestration around the model that you don't see in typical inference workflows," said Nikoli Dryden, LLNL computer scientist and ElMerFold technical team lead. "In some ways, it looks more like some of the scientific workflows that the lab does in other contexts."

To accelerate these steps, the team introduced optimizations at every level of the software stack, including the Lab's open-source Flux workload manager. They created a persistent inference server, improved memory handling and redesigned how sequences are distributed. These changes enabled overlap between slow and fast processes - dramatically increasing throughput and overall efficiency. As a result, ElMerFold achieved a 17.2-fold speedup over the original OpenFold2 implementation, which had been optimized for older NVIDIA GPUs.

Built for scale: the ElMerFold workflow

To optimize for El Capitan's unique APU architecture - which combines CPUs and GPUs on a single chip with unified memory - LLNL, in collaboration with Columbia University and AMD, redesigned major parts of the OpenFold code base. The team rewrote key parts of the code to make it faster, more memory-efficient and resilient to failures at scale, eliminating transfers between devices and addressing a key bottleneck in memory-intensive workloads like protein folding.

"ElMerFold showcases how the AMD-unified APU architecture accelerates the most demanding AI workloads at the largest scale ever achieved," said Nicholas Malaya, fellow in high-performance computing at AMD. "Running on the world's fastest supercomputer, this collaboration underscores what's possible when innovative hardware and leading-edge science come together to push the state of the art in AI."

The team developed the LBANNv2 backend for PyTorch that allowed for unique optimizations targeting the AMD Instinct MI300A processor. The team also adopted advanced tools such as Triton (a PyTorch kernel language) and DaCe spatial optimization domain specific language, making their codebase more portable and easier to tune for future hardware. With all El Capitan's available nodes running the job, the team reached a peak performance of 604 petaflops - far surpassing earlier runs.

"Originally, we thought it would take 6-8 days of El Capitan to generate the baseline 41 million structures," Van Essen said. "With this speedup, we got that down to hours."

This performance leap has major implications for training OpenFold3. Unlike AlphaFold3, which remains proprietary with restrictive licensing, OpenFold3 aims to bring world-class protein prediction to the broader scientific community. But training OpenFold3 requires a massive distillation dataset, one that ElMerFold now makes possible.

Eventually, scientists aim to simulate a billion different combinations of these protein interactions called "multimers" - a feat once thought computationally out of reach - using advanced AI and supercomputers.

"We need these sorts of performance improvements to make extending these models and producing larger distillation datasets feasible," Dryden said.

By predicting how billions of these protein assemblies behave, researchers could unlock new insights into bioresilience and biology, accelerate drug discovery and better prepare for future health challenges all in a virtual lab, and at unprecedented speed and scale.

In addition to supporting the next generation of open and transparent protein structure models, the innovations powering ElMerFold - from adaptive scheduling to portable AI kernels - could also be applied to other scientific workloads such as biomedical research, earth systems modeling, materials science and other national security, researchers said.

Pushing the boundaries of scientific AI

Ranked No. 1 on the TOP500 list of the world's most powerful supercomputers, El Capitan was built primarily for NNSA stockpile stewardship simulations, but the success of ElMerFold demonstrates how the architecture can accelerate AI-driven life sciences and highlights the growing importance of high-performance computing (HPC) in biology, where AI models and massive datasets are becoming standard tools for discovery.

Looking ahead, the team plans to continue generating large-scale distillation datasets and training OpenFold3 on Tuolumne, El Capitan's unclassified, smaller companion system. The 288-petaFLOP Tuolumne shares the same APUs and architecture as El Capitan but is about 1/10th the size of its larger sibling. This work will lay the foundation for future models capable of handling more complex structures, including protein-protein interactions and multimers at massive scale.

"The model itself is both transformative for the OpenFold work and the Lab," Dryden said. "Many other applications are starting to look at this sort of large-scale inference workload, and being able to produce the OpenFold3 model is a centerpiece of a lot of planned initiatives and capabilities."

Other ElMerFold team members include Tal Ben-Nun, Pier Fiedorowicz, Tom Benson and Bronis R. de Supinski of LLNL; Vinay Swamy, Colin Kalicki and Mohammed Al Quraishi of Columbia University; and Vinayak Gokhale and J. Austin Ellis of AMD.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.