AI Powers Discovery of Tiny Crystal Structures

Columbia University School of Engineering and Applied Science

One longstanding problem has sidelined life-saving drugs, stalled next-generation batteries, and kept archaeologists from identifying the origins of ancient artifacts.

For more than 100 years, scientists have used a method called crystallography to determine the atomic structure of materials. The method works by shining an X-ray beam through a material sample and observing the pattern it produces. From this pattern – called a diffraction pattern – it is theoretically possible to calculate the exact arrangement of atoms in the sample. The challenge, however, is that this technique only works well when researchers have large, pure crystals. When they have to settle for a powder of minuscule pieces — called nanocrystals — the method only hints at the unseen structure.

Scientists at Columbia Engineering have created a machine learning algorithm that can observe the pattern produced by nanocrystals to infer the material's atomic structure, as described in a new study published in Nature Materials. In many cases, their algorithm achieves near-perfect reconstruction of the atomic-scale structure from the highly degraded diffraction information — a feat unimaginable just a couple of years ago.

"The AI solved this problem by learning everything it could from a database of many thousands of known, but unrelated, structures," says Simon Billinge, professor of materials science and of applied physics and applied mathematics at Columbia Engineering. "Just as ChatGPT learns the patterns of language, the AI model learned the patterns of atomic arrangements that nature allows."

Crystallography Transformed Science

Crystallography is vital to science because it's the most effective method for understanding the properties of virtually any material. The method typically relies on a technique called X-ray diffraction, in which scientists shoot energetic beams at a crystal and record the pattern of light and dark spots it produces, sort of like a shadow. When crystallographers use this technique to analyze a large and pure sample, the resulting X-ray patterns contain all the information needed to determine its atomic-level structure. Best known for enabling the discovery of DNA's double-helix structure, the method has opened important avenues of research in medicine, semiconductors, energy storage, forensic science, archaeology, and dozens of other fields.

Unfortunately, researchers often only have access to samples of very small crystallites, or atomic clusters, in the form of powder or suspended in solution. In these cases, the X-ray patterns contain much less information, far too little for researchers to determine the sample's atomic structure using existing methods.

AI Extends the Method to Nanoparticles

The team trained a generative AI model on 40,000 known atomic structures to develop a system that is able to make sense of these inferior X-ray patterns. The machine learning technique, called diffusion generative modeling, emerged from statistical physics and recently gained notoriety for enabling AI-generated art programs like Midjourney and Sora.

"From previous work, we knew that diffraction data from nanocrystals doesn't contain enough information to yield the result," Billinge said. "The algorithm used its knowledge of thousands of unrelated structures to augment the diffraction data."

To apply the technique to crystallography, the scientists began with a dataset of 40,000 crystal structures and jumbled the atomic positions until they were indistinguishable from random placement. Then, they trained a deep neural network to connect these almost randomly placed atoms with their associated X-ray diffraction patterns. The net used these observations to reconstruct the crystal. Finally, they put the AI-generated crystals through a procedure called Rietveld refinement, which essentially "jiggles" crystals into the closest optimal state, based on the diffraction pattern.

Although early versions of this algorithm struggled, it eventually learned to reconstruct crystals far more effectively than the researchers had expected. The algorithm was able to determine the atomic structure from nanometer-sized crystals of various shapes, including samples that had proven too difficult for previous experiments to characterize.

"The powder crystallography challenge is a sister problem to the famous protein folding problem where the shape of a molecule is derived indirectly from a linear data signature," said Hod Lipson, James and Sally Scapa Professor of Innovation and chair of the Department of Mechanical Engineering at Columbia Engineering, who, with Billinge, co-proposed the study. "What particularly excites me is that with relatively little background knowledge in physics or geometry, AI was able to learn to solve a puzzle that has baffled human researchers for a century. This is a sign of things to come for many other fields facing long-standing challenges."

The century-old powder crystallography puzzle is particularly meaningful to Lipson, who is the grandson of Henry Lipson CBE FRS (1910–1991) who pioneered computational crystallography methods. In the 1930s, Henry Lipson worked with Bragg and other contemporaries to develop early mathematical techniques that were broadly used to solve the first complex molecules, such as penicillin, leading to the 1964 Nobel prize in Chemistry.

Gabe Guo BS'24, currently a PhD student at Stanford University, who led the project while he was a senior at Columbia, said, "When I was in middle school, the field was struggling to build algorithms that could tell cats from dogs. Now, studies like ours underscore the massive power of AI to augment the power of human scientists and accelerate innovation to new levels."

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.