AI Chatbots Decode Big Data with Precise Prompts

University of California - San Francisco

In an early test of how AI can be used to decipher large amounts of health data, researchers at UC San Francisco and Wayne State University found that generative AI tools could perform orders of magnitude faster — and in some cases better than computer science teams that had spent months poring over the data.

Teams of scientists and scientists paired with AI were given the same task: predict preterm birth based on data from more than 1,000 pregnant women. 

Even a junior research duo composed of a master's student at UCSF, Reuben Sarwal, and a high school student, Victor Tarca, made viable prediction models with AI assistance. They generated working computer code in minutes — a task that would have taken experienced programmers at least a few hours and up to a few days.

The AI's strength came from its ability to write code to analyze health data based on a short but highly specialized prompt. Not all the AIs were helpful — just 4 of the 8 AI chatbots producedusable code. But they didn't require a village of experts to help them do it.

This enabled the junior scientists to quickly run the experiments, make sure they were correct, and write and submit their findings to a journal in just a few months.

"These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines," said Marina Sirota , PhD, a professor of Pediatrics who is the interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF and the principal investigator of the March of Dimes Prematurity Research Center at UCSF. "The speed-up couldn't come sooner for patients who need help now."

Sirota is co-senior author of the paper, which appears in Cell Reports Medicine on Feb. 17.

Can big data make pregnancy safer?

A faster path from data to discovery could lead to more reliable diagnostic testing for preterm birth — the leading cause of newborn death and a leading cause of long-term motor and cognitive impairment in children. About 1,000 babies are born too soon in the United States every day.

Scientists still don't know much about what leads to preterm birth. To search for clues, Sirota's team has amassed microbiome data from about 1,200 pregnant women whose birth outcomes were tracked across nine studies.

"This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers," said Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository , associate professor in UCSF BCHSI, and co-author of the paper.

But the immense quantity of data was still hard to analyze. So, the team crowdsourced help from data scientists through a competition called DREAM (Dialogue on Reverse Engineering Assessment and Methods).

Sirota co-led one of three DREAM challenges on pregnancy; hers was devoted to analyzing data from the vaginal microbiome . More than 100 groups from around the world competed to create machine learning algorithms to identify patterns in the data that could indicate preterm birth. Most of the groups achieved this goal in the three months set by the challenge. But compiling the results and publishing them took nearly two years.

AI gives pregnancy data analysis a welcome boost

Sirota's team wanted to see if AI could do it faster. They partnered with a team led by Adi L. Tarca, PhD, co-senior author and professor in the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, MI, who had led the two other DREAM challenges, which were about finding better ways to date pregnancy.

Together, they instructed eight AI tools to build algorithms to make pregnancy assessments using the same data from the three DREAM challenges, but with no human input.

The AI chatbots were given natural language prompts to accomplish this. It was much like the way ChatGPT works, but the prompts were carefully phrased to guide the AI to assess the human health data the way the DREAM teams had.

The goal mirrored the DREAM challenges: analyze vaginal microbiome data for signs of preterm birth; and analyze samples of blood or placental tissues to determine the stage of pregnancy. This is almost always an estimate, and it determines the type of care women receive as pregnancies proceed. When the estimate is off, it's hard to prepare for the onset of labor.

The scientists ran the AI-crafted code on the DREAM challenge data. Only 4 of the 8 AI tools produced prediction models that performed as well as the those made by the DREAM teams, but sometimes the AI outperformed the human teams. And the entire generative AI project — from inception to submission of a paper — took just six months.

Scientists still need to be on guard for misleading results, a persistent problem, and step in when the AI fails. And the technology is no replacement for human expertise. But its power could enable scientists to quickly parse massive amounts of data — freeing them up to think more deeply.

"Thanks to generative AI, researchers with a limited background in data science won't always need to form wide collaborations or spend hours debugging code," Tarca said. "They can focus on answering the right biomedical questions." 

Authors: UCSF authors are Reuben Sarwal; Claire Dubin; Sanchita Bhattacharya, MS; and Atul Butte, MD, PhD. Other authors are Victor Tarca (Huron High School, Ann Arbor, MI); Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University); Gaurav Bhatti (Wayne State University); and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).

Funding: This work was funded by the March of Dimes Prematurity Research Center at UCSF, and by ImmPort. The data used in this study was generated in part with support from the Pregnancy Research Branch of the NICHD.

About UCSF: The University of California, San Francisco (UCSF) is exclusively focused on the health sciences and is dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care. UCSF Health , which serves as UCSF's primary academic medical center, includes among the nation's top specialty hospitals and other clinical programs, and has affiliations throughout the Bay Area. UCSF School of Medicine also has a regional campus in Fresno. Learn more at ucsf.edu or see our Fact Sheet .

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.