LLNL, Meta Create Cutting-Edge Polymer AI Dataset

Courtesy of LLNL

Polymers are fundamental to our daily lives, serving as the core components for a wide array of goods, including clothing, packaging, transportation infrastructure, construction materials and electronics. Advances in polymer science open pathways for recycling and upcycling waste materials into more valuable chemical feedstocks. They also can have an outsized environmental impact: many widely used polymers are Per- and Polyfluoroalkyl Substances (PFAS), widely recognized as "forever chemicals."

In a pioneering partnership to accelerate materials discovery with artificial intelligence (AI), researchers from Lawrence Livermore National Laboratory (LLNL) and Meta have created the world's largest open dataset of atomistic polymer chemistry - a trove of millions of quantum-accurate simulations designed to help AI model the complex behavior of plastics, films, batteries and countless everyday materials.

In a recent paper, the team details Open Polymers 2026 (OPoly26) - a dataset with an unprecedented number and diversity of polymer structures with corresponding simulations performed at quantum accuracy. OPoly26 is a massive reference library that enables AI to learn patterns from millions of pre-computed polymer structures in hours or days, addressing a longstanding gap in polymer data and laying the foundation for safer, faster and more sustainable materials design. The OPoly26 paper formalizes the dataset's release and demonstrates how the data improves the performance of machine-learned interatomic potentials (MLIPs) on polymer materials.

The work builds on the Meta and Lawrence Berkeley National Laboratory (LBNL)-led Open Molecules 2025 (OMol25) Dataset, which is making waves with its sweeping collection of open molecular data aimed at advancing AI-driven chemistry. The OPoly26 dataset contains more than 6 million density functional theory (DFT) calculations on polymeric chemical systems, making it nearly ten times larger than the next largest comparable polymer dataset.

LLNL's partnership with Meta - described by LLNL materials scientist and OPoly26 co-principal investigator (PI) Evan Antoniuk as a "natural fit" - seeks to address this shortfall. By generating critical missing data on polymers with the shared goals of expanding and democratizing open datasets for materials scientists, the team hopes to accelerate the pace of discovery across polymer chemistry.

"This fills a huge gap," said Antoniuk. "We see this as a community resource, one that we hope becomes the go-to starting point for anyone interested in performing atomistic simulations of polymers."

LLNL contributed significant computational power and polymer domain knowledge - generating a diverse set of polymer structures and running simulations to help model how these polymers behave in real-world conditions. In turn, Meta contributed vast computational resources to perform 1.2 billion core hours of DFT simulations and train state-of-the-art MLIP models, leveraging the expertise that had already been refined during their earlier molecular effort.

"Meta's partnership with LLNL demonstrates how open science and AI can accelerate breakthroughs in materials research," said Rob Sherman, vice president of policy at Meta. "By making this dataset publicly available, we're giving scientists potent new tools to address critical challenges in healthcare and beyond."

LLNL is uniquely positioned to generate the OPoly26 dataset at the scale and fidelity required. Researchers tapped into LLNL's Tuolumne, the world's 12th fastest supercomputer and companion to the exascale El Capitan, leveraging this hardware with their collective expertise to compress years of simulation work into months and enabling the dataset to reach a scale unmatched in polymer science.

"Comprehensive coverage of this chemical space is essential to the success of the OPoly26 dataset," said LLNL staff scientist Nick Liesen. "We have worked to leverage pipelines that take us from a simple text string to fully atomistic representations of polymer dynamics at scale."

Beyond performing all the DFT calculations, researchers at Meta trained and benchmarked machine-learned interatomic potentials at scale, enabling the team to evaluate how well AI models generalize across small-molecule and polymer chemistry. The paper reports substantial improvements in model accuracy when polymer data is incorporated alongside small-molecule training sets, highlighting the importance of training AI on data that reflects real-world complexity.

Understanding why certain polymers, including PFAS-based materials, resist chemical change requires models that can accurately describe both reactive and nonreactive behavior. Capturing this behavior under realistic conditions required careful attention to reactive configurations, according to LBNL chemist and OPoly26 co-PI Sam Blau, who also previously co-led OMol25.

"Reactivity - the breakage and formation of chemical bonds - is central to polymer synthesis, manufacturing, aging and recycling, and to nanoscale patterning of polymer thin films for semiconductor manufacturing," said Blau. "By going beyond stable structures and explicitly sampling hundreds of thousands of reactive configurations, we aim to accurately describe the reactive events that often govern polymer behavior under real-world conditions."

Beyond outlining how the dataset was generated and performing standard tests of MLIP performance, the OPoly26 paper also introduces an initial suite of polymer-specific evaluation tasks to benchmark how effectively these models capture simulated polymer phenomena and interactions, such as polymer solvation. Future work will include evaluating the MLIP models against experimental measurements, offering a gauge of how well they can capture real-world polymer properties.

"LLNL's significant investment in high-performance scientific computing and computational materials science capabilities have been critical to achieving the scale needed to cover many thousands of distinct chemical structures," said LLNL Materials Science Division Leader Ibo Matthews. "That scale is essential not only for generating the data, but for rigorously evaluating how well AI models perform across the full range of polymer behaviors relevant to real-world applications."

With a focus on open collaboration, the team is making all data publicly available to fuel polymer advancements across academia, industry and government. The authors also emphasized that OPoly26 is being released under an open license to maximize reuse and reproducibility. Through this open approach, the partnership ensures that the benefits of this public-private investment flow broadly across the entire research community.

The team includes LLNL scientists Brian Van Essen, James Diffenderfer, Helgi Ingolfsson and Supun Mohottalalage, and polymer simulation experts Amitesh Maiti and Matt Kroonblawd from the Lab's Materials Science Division. Co-authors also included LBNL's Nitesh Kumar and Lauren Chua. Blau and Kumar's work was funded by the Center for High Precision Patterning Science (CHiPPS), while Chua was supported by her DOE Computational Sciences Graduate Fellowship. LLNL's Laboratory Directed Research and Development program funded the LLNL researchers.

This partnership was made possible through a data transfer agreement, facilitated by LLNL's Innovation and Partnerships Office (IPO). IPO is the Laboratory's focal point for industry engagement and facilitates partnerships to deliver mission-driven solutions that support national security and grow the U.S. economy. To connect with LLNL on industrial partnerships in Advanced Computing, AI and Quantum technologies, contact IPO Business Development Executive Clarence Cannon.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.