From Sequence To Structure: Fast Track For RNA Modeling

Berkeley Lab

In Biology 101, we learn that RNA is a single, ribbon-like strand of base pairs that is copied from our DNA then read like a recipe to build a protein. But there's more to the story. Some RNA strands fold into complex shapes that allow them to drive cellular processes like gene regulation and protein synthesis, or catalyze biochemical reactions. We know that these active molecules, called non-coding RNAs, are present in all life forms, yet we're just starting to understand their many roles - and how they can be harnessed for applications in environmental science, agriculture, and medicine.

To study - and potentially modify - the functions of non-coding RNAs, we need to determine their structure. Scientists from Lawrence Berkeley National Laboratory (Berkeley Lab) and the Hebrew University of Jerusalem have developed a streamlined process that predicts the structure of an RNA molecule down to the atomic level. Members of the research community can come to Berkeley Lab's Advanced Light Source (ALS) user facility knowing nothing more than the molecule's nucleotide sequence and get a structure, or they can do it themselves using the team's open-source software.

"We were looking at the bigger picture with structure prediction, like how we can go from A to Z rather than working on A, B, and D. That's what we try to do at Berkeley Lab, make it user friendly," said Michal Hammel, a staff scientist in Berkeley Lab's Molecular Biophysics and Integrated Bioimaging (MBIB) division. Hammel co-developed the process, called SOlution Conformation PrEdictor for RNA (SCOPER), with MBIB colleague Scott Classen and Hebrew University collaborators Dina Schneidman-Duhovny and Edan Patt.

A paper describing SCOPER was recently published in Biophysical Journal.

Historically, it has ranged between difficult to impossible to accurately determine the three-dimensional atomic blueprint of a folded RNA because they rarely convert into a neat crystalline form to be imaged with X-ray crystallography. And because the twists and folds of the RNA strand move around as the molecule functions, there are actually multiple correct structures.

In recent years, artificial intelligence (AI) tools like AlphaFold have become very accurate at generating protein structure predictions based on amino acid sequence, making life a lot easier for scientists worldwide and greatly accelerating the pace of drug discovery. These algorithms have been expanded to RNA structures, but the accuracy remains middling. Getting a reliable model currently involves combining the outputs of multiple computational tools and imaging data. It's a long process, and still fraught with uncertainty.

What is Small Angle X-Ray Scattering?

SAXS is a type of X-ray characterization technique that is particularly well-suited for analyzing biological molecules. Samples can be in liquid, under conditions that closely mimic their native environment. Additionally, SAXS can pulse many liquid droplet samples in a short period of time, generating large datasets that can be analyzed with special software to determine structural models. Though SAXS can't achieve atomic resolution, it can be paired with other techniques, such as X-ray crystallography, nuclear magnetic resonance spectroscopy, or, in this case, AI-driven predictions, to build a reliable atomic model.

Berkeley Lab has cutting-edge SAXS capabilities at the SIBYLS (Structurally Integrated BiologY for Life Sciences) Beamline at the ALS, a synchrotron that generates light at different frequencies. Scientists around the world use the SIBYLS SAXS beamline to analyze biomolecules for a variety of applications.

SCOPER has simplified it significantly. Say you want to study a new RNA: First, put the nucleotide sequence into one of the open-source, AI-based structure prediction tools available today. Then, take your sample to a small angle X-ray scattering (SAXS) facility for characterization. Better yet, let Hammel and his colleagues at the ALS's SAXS beamline get that data for you.

Take the SAXS data and predicted structures, and put them through SCOPER's pipeline. The first step uses an existing program to generate possible flexible arrangements of the RNA from the predicted static structures. Next, a new machine learning program, developed and trained on existing atomic structures by Patt, refines the structures by adding the placements of magnesium ions. Inside cells, positively charged magnesium ions interact with negatively charged RNAs to keep them folded stably. Their presence also helps elucidate structure when using SAXS.

Next, SCOPER generates simulated SAXS data representing the theoretical structures and compares them with the real-world SAXS data to determine which structure is correct.

Finally, another software program models the multiple arrangements that the confirmed structure might take as it functions. Without having to corral multiple software tools themselves, users walk away with a set of precise, three-dimensional atomistic models.

"These days, programs like AlphaFold are almost 95% accurate for proteins but much worse for RNA. It will sometimes come up with five different models that are different. And now the question is, which one is right?" said Hammel. "SCOPER can tell you."

Researchers in the U.S. and Europe are already using the process, but the team is still working to make SCOPER even more convenient. The computing cluster at the SIBYLS beamline can run SCOPER as well as the initial structure prediction software like AlphaFold3, so users don't need to perform that step in advance. But the power of this cluster is limited, so to make it as smooth and speedy as possible, Classen is installing the pipeline on the supercomputing systems at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) user facility. His goal is to make one "nice and neat" self-contained application at NERSC that users could operate with ease.

When complete, researchers could perform the whole process remotely by using the SIBYLS automated beamline capabilities, allowing the flexibility for users to mail in samples. Then, they could access SCOPER online.

Berkeley Lab will be a one-stop shop for visualizing the solution state of RNAs.

This work was funded by the U.S. Department of Energy (DOE) Office of Science, the U.S.-Israel Binational Science Foundation, and the National Cancer Institute. The SIBYLS beamline is supported in part through the Integrated Diffraction Analytical Technologies award through the DOE Office of Science Biological and Environmental Research program. The ALS and NERSC are DOE Office of Science user facilities.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.