Crunching Big Data Into 3D Images Accelerates Discovery

Berkeley Lab

Key Takeaways

  • Large, complex datasets collected at the Advanced Light Source can now be processed a hundred times faster than before, at the National Energy Research Scientific Computing Center (NERSC), a high-performance computing (HPC) facility that provides valuable real-time feedback to scientists as they're performing experiments.
  • By connecting the Advanced Light Source with an HPC center, the Superfacility approach supercharges scientific productivity and accelerates discovery - transforming how researchers collect, process, and interpret data in real time.
  • The work is the culmination of a two-year collaborative project, involving cross-laboratory efforts ranging from coding to system integration and testing.

Once upon a time, photographers would aim a camera and click, working essentially blind to what they were actually capturing. Beyond the days or weeks needed for film development, they had no way to measure exposure accuracy, detect subtle compositional flaws, or analyze whether critical details were properly focused. Today's digital cameras don't just provide instant visual feedback - they offer real-time analysis through histograms, focus maps, and intelligent scene recognition that can guide photographers toward optimal results as they shoot.

A similar leap in efficiency - and ultimately, quality - has now been achieved by forging a direct, real-time connection between the Advanced Light Source (ALS) and the National Energy Research Scientific Computing Center (NERSC), a high-performance computing facility at Lawrence Berkeley National Laboratory (Berkeley Lab). Through this link, three-dimensional (3D) X-ray images generated at the ALS can be streamed and processed on powerful supercomputers within seconds. The ALS generates exceptionally bright beams of light, including X-rays, that researchers from around the world use to study materials ranging from fuel-cell components and concrete to teeth, brain structure, and more. Just as digital photography revolutionized how quickly we learn from images, this new real-time data streaming pipeline enables scientists at the ALS to transform huge X-ray datasets into 3D images in seconds rather than hours, accelerating discovery across a range of disciplines and changing how researchers collect and interpret X-ray data.

At ALS Beamline 8.3.2, researchers use a process called microcomputed tomography (micro-CT) to obtain 3D images of microstructures inside samples without the need to physically slice them open. A series of X-ray images is collected as the sample is rotated, and the raw data is computationally converted into digital sections that can be stacked to reconstruct 3D visualizations. These scans can produce datasets of 50 gigabytes or more in size-equivalent to storing about 10,000 high-resolution photos.

Connecting a user facility like the ALS with a high-performance computing (HPC) facility implements the superfacility principle that can supercharge scientific productivity and accelerate discovery. This principle is at the heart of the new real-time data streaming pipeline at the ALS to enable real-time data analysis and feedback. This novel pipeline was developed through the NERSC's Science Acceleration Program (NESAP) in collaboration with engineers from NERSC, an HPC facility at Berkeley Lab. The ALS and NERSC are connected by the Energy Sciences Network (ESnet), the DOE's high-performance network for scientific research. The collaboration integrates experimental and computational resources across user facilities to enable real-time data analysis and feedback to make better scientific discoveries possible. It is now in active use for daily user experiments and serves as a model for real-time data systems across other DOE light sources. Since its initial launch, it's been used by scientists to image the intricate insides of fuel cells, batteries, and critical materials.

"Without any prior experience working with a supercomputer, users can use one at the push of a button," said Sam Welborn, former NESAP Postdoctoral Fellow at NERSC. "As detector data streams over the network, processes running at the supercomputer accept and reconstruct it in real time with multiple high-powered graphics processing units (GPUs). Less than 10 seconds after an acquisition is finished, users can look at the reconstruction to figure out their next experimental steps," stated Welborn.

Dula Parkinson, an ALS staff scientist and operations manager of ALS Photon Science, and ALS Research Scientist Liz Clark demonstrated this new capability for the first time in daily production with collaborators from the Saad Bhamla Lab at Georgia Tech University. The Bhamla group is interested in exploring the physics behind a wide range of natural phenomena, from the movement of worms in clusters to flamingo feeding strategies. Their goal is to discover the physical principles behind how natural organisms function, and ultimately to find ways to apply those principles to engineer new materials and tools.

"Previously, we would write all of the images from a scan to a file at the ALS, and after the scan was done, we would start processing it, which could take tens of minutes if done locally," said Parkinson. "Now, with the Superfacility Streaming framework developed through NESAP and with our ALS Computing and Controls teams, we stream the data as it is collected without writing it to a file, and the process starts at the beginning of a scan instead of when the scan is over. This significantly improves experimentation efficiency and enables more effective use of limited beamtime, and we're looking forward to combining this with AI/ML tools we are developing for automated image segmentation and analysis to take this even farther."

At the ALS, Bhamla doctoral student Nami Ha was studying bird feathers to learn how they naturally come by their unique qualities, including strength, light weight, flexibility, and insulation. Using micro-CT, researchers are able to view intricacies of these structures you can't see using any other method. They can export the data to software for different kinds of 3D modeling and visualization, which expands the types of analyses they can perform. The scientists are using the data to understand why feathers are so good at repelling water with the hopes of applying this design to improve water-resistant materials.

"Getting all that information instantaneously during the experiment was mind-blowing! This real-time feedback not only allows us to see the data almost instantly but also enables us to refine our experiments for more successful data collection," said Clark.

Moreover, this success reflects the coordinated work of more than 30 contributors over two years. It builds upon shared code from collaborators at the Advanced Photon Source at Argonne National Laboratory, as well as local efforts within the Berkeley Lab's ecosystem. Researchers and staff from ALS Beamline Controls and Photon Science Computing provided essential support for integration and testing, while ESnet, members of NERSC's NESAP program, and Berkeley Lab's Information Technology (IT) Division made updates to beamline infrastructure and data-acquisition software and hardware which further contributed to the achievement.

"What we are seeing here is a prime example of how connecting a user facility with a HPC facility can supercharge scientific productivity and thus accelerate scientific discovery." – Bjoern Enders

"The ALS has been NERSC's partner for more than 10 years," said Bjoern Enders, data science workflows architect at NERSC. "We have been working together more closely since the Superfacility Project and its successor, DOE's Integrated Research Infrastructure program (IRI). What we are seeing here is a prime example of how connecting a user facility with a HPC facility can supercharge scientific productivity and thus accelerate scientific discovery."

The next phase will expand the framework to support ptychographic imaging, another image-intensive technique that can map chemical compositions down to five nanometers. With such tools, researchers can see changes in how batteries charge and discharge, or how biomineral structures (in corals for example) provide exceptional strength and resilience, and superconductivity in quantum materials.

This research and the Advanced Light Source user facility are funded by the Department of Energy's Office of Science. The National Energy Research Scientific Computing Center (NERSC) is the mission computing facility for the U.S. Department of Energy Office of Science, the nation's single largest supporter of basic research in the physical sciences.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.