Materials Project Fuels AI Revolution in Science

Berkeley Lab

Key Takeaways

  • The Materials Project is the most-cited resource for materials data and analysis tools in materials science.
  • The Materials Project and its tools have been cited more than 32,000 times in peer-reviewed studies, enabling advances in batteries, quantum computing, microelectronics, catalysts for industrial manufacturing, and more.
  • The Materials Project is used 5,000 times per day by more than 650,000 registered users.

In 2011, a small team at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) launched what would become the world's most-cited materials database. Today, the Materials Project serves over 650,000 users and has been cited more than 32,000 times - but its real impact may just be emerging.

When renowned computational materials scientist Kristin Persson and her team first created the Materials Project, they envisioned an automated screening tool that could help researchers in industry and academia design new materials for batteries and other energy technologies at an accelerated pace. A user-friendly interface would connect researchers to the largest collection of materials properties for free. Its open-source framework - supported by supercomputers at the National Energy Research Scientific Computing Center (NERSC), a Department of Energy user facility at Berkeley Lab - would help to democratize materials knowledge and foster collaboration across the disciplines. Another plus: No programming experience required.

Word of this pioneering database soon got around the materials science community, and the Materials Project quickly became one of the most popular materials-data providers in the world. By early 2020, as many as 120,000 people - from national lab scientists and industry innovators to inquisitive high school students - had joined the Materials Project community.

And now the Materials Project has reached another big milestone: surpassing 650,000 registered users. This exponential growth reflects a surging demand for curated, machine-learning-ready datasets that can immediately power AI applications without extensive preprocessing.

A data powerhouse and a machine-learning revolution

In its 14 years of operation, the Materials Project and its software tools have been cited more than 32,000 times by studies published in peer-reviewed scientific journals, enabling advances in batteries, quantum computing, microelectronics, catalysts for industrial manufacturing, and more. Its library of computed materials data now includes more than 200,000 materials - from common metals to exotic compounds - and over 577,000 molecules. In the last two years, it delivered 465 terabytes of data to its users - the equivalent to roughly 100 million high-resolution photos or 100,000 summer blockbusters in HD.

"Machine learning is game-changing for materials discovery because it saves scientists from repeating the same process over and over while testing new chemicals and making new materials in the lab," said Persson, the Materials Project Director and Co-Founder. "To be successful, machine learning programs need access to large amounts of high-quality, well-curated data. With its massive repository of curated data, the Materials Project is AI ready."

Since the beginning, the Materials Project leadership team at Berkeley Lab - consisting of Persson, who also holds titles of Faculty Senior Scientist in Berkeley Lab's Materials Sciences Division and Professor of Materials Science and Engineering at UC Berkeley; Anubhav Jain, a Berkeley Lab Staff Scientist and the Materials Project Associate Director; and Patrick Huck, a Berkeley Lab Senior Computing Engineer and the Materials Project Technical Lead - have worked closely with numerous contributors from industry, the national labs, and academia, many of whom are listed as co-authors in a perspective article Persson and team published recently in the journal Nature Materials.

Together, they improved the Materials Project with more materials, better algorithms and search capabilities, and more diverse coverage of properties. With user-friendliness as a guiding principle, they had the foresight to help researchers understand and identify functional materials by building state-of-the-art machine-learning algorithms into the system, years before the ascendance of AI.

"The Materials Project has been at the forefront of enabling this machine-learning revolution in materials science," said Jain. "Many machine-learning companies - from new startups to established companies - rely on the Materials Project to train their machine-learning models for predicting materials properties, which their engineers and scientists in turn use to develop their products."

AI-ready: The power of curated data

Researchers are currently looking for new battery materials to more effectively store energy for the grid or for transportation, or new catalysts to help improve efficiencies in the chemical industry. But experimental data are available for fewer than one percent of compounds in open scientific literature, limiting our understanding of new materials and their properties. This is where data-driven materials science can help.

"Accelerating materials discoveries is the key to unlocking new energy technologies," Jain said. "What the Materials Project has enabled over the last decade is for researchers to get a sense of the properties of hundreds of thousands of materials by using high-fidelity computational simulations. That in turn has allowed them to design materials much more quickly as well as to develop machine-learning models that predict materials behavior for whatever application they're interested in."

Four researchers reviewing documents on a table. Various colorful material structures are visible on a screen in the background.

The Materials Project platform uses high-throughput computational modeling at the National Energy Research Scientific Computing Center (NERSC) to screen large libraries of materials for specific purposes. Properties are calculated using advanced computational methods and validated against real-world experiments. This approach allows researchers to rapidly test and evaluate many different materials, accelerating the discovery process.

The platform also provides standardized datasets formatted for training machine-learning systems, including detailed information about a material's electron density. Such curated data allow researchers to validate new AI models against performance benchmarks. This extensive preparation eliminates the months typically required to assemble and clean materials datasets, allowing researchers to focus on developing new AI algorithms and making scientific discoveries.

During the pandemic, the Materials Project's AI-readiness allowed materials research to continue despite site-access restrictions to experimental research laboratories. "Experimental materials scientists who traditionally performed hands-on laboratory experiments turned to digital tools to analyze data and run simulations while working remotely. And today, a modern platform like the Materials Project is now expected to operate around the clock to support a user community that has grown by a factor of 2.5 since May 2022," said Huck.

To support this growing demand, Huck and team worked with industry partners such as MongoDB, a leading database for modern applications, the observability platform Datadog, and the cloud computing provider Amazon Web Services to migrate the Materials Project to a cloud-based infrastructure that supports everything from rapid property searches to massive data downloads, and interactive tools enabling real-time exploration of how different materials relate to each other. This innovative cloud infrastructure ensures a 99.98% uptime, the industry standard for high availability.

From database to discovery

The Materials Project has been adopted across universities, research labs, and companies worldwide, serving research into batteries, semiconductors, catalysts, and structural materials.

"The Materials Project serves as a strong bridge between industry and academia by providing the entire research community with transparently developed open-source tools."

- Brian Storey, Toyota Research Institute Vice President

Longtime user Toyota Research Institute (TRI), which is headquartered in Los Altos, California, and has facilities in Cambridge, Massachusetts, and Ann Arbor, Michigan, has relied on the Materials Project's open-source tools and data to develop new materials. TRI is a research and scientific development subsidiary of Toyota Motor Corporation focused on developing technologies in artificial intelligence, vehicle automation, materials science, and robotics.

TRI researchers reported the discovery of LiMOCl4 (M=Nb, Ta), new solid electrolytes for solid-state batteries, through a molecular structure identified in the Materials Project. Researchers are interested in advancing solid-state batteries to overcome the limitations in charging and efficiency of current lithium-ion battery technologies.

"The Materials Project serves as a strong bridge between industry and academia by providing the entire research community with transparently developed open-source tools. Almost every industrial effort focused on AI for materials discovery - either at established companies or new startups - is being led by one of the many brilliant young scientists who have been trained at the Materials Project. Their fingerprints are everywhere," said Brian Storey, Toyota Research Institute Vice President.

The Microsoft Corp. has also used the Materials Project to train models for materials science, most recently to develop a tool called MatterGen, a generative model for inorganic materials design. Microsoft Azure Quantum developed a new battery electrolyte using data from the Materials Project.

Other notable studies used the Materials Project to successfully design functional materials for promising new applications. In 2020, researchers from UC Santa Barbara, Argonne National Laboratory, and Berkeley Lab synthesized Mn1+xSb, a magnetic compound with promise for thermal cooling in electronics, automotive, aerospace, and energy applications. The researchers found the magnetocaloric material through a Materials Project screening of over 5,000 candidate compounds.

In addition to accessing the vast database, the materials community can also contribute new data to the Materials Project through a platform called MPContribs. This allows national lab facilities, academic institutions, companies, and others who have generated large data sets on materials to share that data with the broader research community.

Other community contributions have expanded coverage into previously unexplored areas through new material predictions and experimental validations. For example, Google Deepmind - Google's artificial intelligence lab - used the Materials Project to train initial GNoME (graph networks for materials exploration) models to predict the total energy of a crystal, a key metric of a material's stability. Through that work, which was published in the journal Nature in 2023, Google DeepMind contributed nearly 400,000 new compounds to the Materials Project, broadening the platform's vast toolkit of material properties and simulations.

The Materials Project contributes or manages more datasets registered with the Department of Energy's Office of Science and Technical Information (OSTI) Data ID Service than any other platform, signifying its leadership in open science and data sharing, and setting standards for data management and accessibility through search engines such as Google Dataset Search. Today, it is just one of seven DOE Office of Science Public Reuseable (PuRe) Data Resources that make curated data publicly available to further scientific discovery and technical knowledge.

The platform's vast library of materials data has not only helped to inspire new energy technologies but also the next generation of materials scientists. "Grad students, postdocs, and professors at public and private colleges and universities rely on the Materials Project to be available 24/7 as a resource for their research. The fact that we're getting cited in research papers more than six times a day on average now shows how much of an educational resource the Materials Project has become in just a decade," said Huck.

Connecting to autonomous labs

As materials science embraces data-driven discovery, the Materials Project's curated datasets position it as an essential infrastructure for AI-powered materials design. The platform is continuing to evolve its machine learning capabilities, with plans for enhanced computational methods and improved handling of complex materials behavior.

"One of the exciting areas that we've been working on is connecting this simulation pipeline to autonomous experiments carried out at Berkeley Lab's A-Lab. Not only are we simulating things in the computer, but we're also bringing new materials into reality," said Jain.

The A-Lab is a fully automated lab that uses robots guided by artificial intelligence to speed up materials science discoveries. Since its launch in 2023, the A-Lab has collaborated with the Materials Project to synthesize novel materials with promise for future technologies.

This combination of comprehensive data coverage, rigorous quality standards, and community-driven expansion creates a foundation for accelerating discovery timelines for new materials with specific desired properties, Jain added.

The Materials Project is supported by the U.S. Department of Energy's Office of Science.

The National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab is the mission computing facility for the DOE Office of Science.

Berkeley Lab's Alison Hatt contributed to this article.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.