Exciting new research at Tohoku University's Advanced Institute for Materials Research (WPI-AIMR) explains how to transform decades of scattered literature data into computable design rules for catalysts. By using human intelligence, regression models, and AI agents, researchers can accelerate the discovery of efficient, low-cost catalysts for clean energy technologies like fuel cells, water splitting, and CO₂ reduction. By combining these methods, researchers can uncover new discoveries that were hidden in the literature data all along.

Catalysts, which speed up chemical reactions, are crucial for many important technologies and manufacturing processes. However, finding the right catalyst for the job is tricky. While the first step is usually to refer to previously published scientific literature, making a cohesive summary of all this data can be overwhelming. Even studies that investigate the same catalyst might cover different experimental conditions and measure different variables, making comparisons difficult. How do we find the best catalyst candidate if the data is all over the place? It would be like trying to compare a database of cake recipes that use different ingredient amounts, bake times, and oven temperatures.
"There is an enormous amount of information in the wealth of scientific literature published so far on catalysts," remarks Distinguished Professor Hao Li (WPI-AIMR). "But taking all of these disparate, individual studies and summarizing them into actionable information - such as gleaning the blueprints for rational catalyst design - is incredibly difficult."

This study summarizes three current methods for reorganizing, re-analyzing, and remodeling information that is "hidden" in the literature. The first is using human brainpower to summarize data manually. The second is data analysis, such as performing a statistical analysis called a regression model on big data to get a quantitative assessment of a certain catalyst's structure-performance characteristics. The third is to use artificial intelligence (AI) to further assess the findings, and propose new candidate materials. Ideally, researchers will use all three together.
"Doing everything by hand is too slow, but relying solely on AI without careful cross-checking can be faulty, so we need a careful balance," says Li.
Re-analyzing data from multiple studies may reveal new information or even anomalies that need the combination of human intelligence and AI to puzzle out an underlying theory to explain it. In this way, even old data can reveal new tricks.
Developing systematic methods to improve catalyst performance such as those proposed in this paper is highly beneficial to our society as they can lead to the faster development of sustainable energy solutions, reduced reliance on expensive noble metals, and progress toward a carbon-neutral society.
These findings were published in EES Catalysis on May 14, 2026.

- Publication Details:
Title: Finding the Hidden Catalytic Knowledge from Literature Data
Authors: Yuhang Wang, Yong Wang, Hao Li
Journal: EES Catalysis
DOI: 10.1039/D6EY00079G