Large language models (LLMs), such as ChatGPT, have rapidly entered healthcare, but strong clinical evidence for their real-world use remains limited. A new study published in Gastroenterology & Endoscopy provides the first overview of randomized controlled trials (RCTs) evaluating LLMs specifically in digestive diseases.
The international research team systematically reviewed published and ongoing RCTs conducted since 2022 and identified only 14 eligible trials worldwide—four published and ten ongoing. Most studies were carried out in China and the United States and focused primarily on gastrointestinal and hepatobiliary diseases. The most common applications of LLMs included clinical decision-making and patient education, with question answering being the dominant task.
"We found that while enthusiasm for using LLMs in digestive diseases is growing, high-quality clinical evidence is still scarce," said first author of the study Dr. Peng Wu, from the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. "Randomized controlled trials are essential to determine whether these tools truly improve patient outcomes and healthcare quality."
Notably, although many studies claim clinical relevance, only a subset used real patient data, and most trials were single-center and exploratory in nature. The authors also found that both general-purpose models (such as ChatGPT) and domain-specific medical language models are being tested, reflecting different strategies for integrating AI into clinical workflows.
Dr. Zhirong Yang, co-corresponding author, emphasized the importance of cautious implementation. "Large language models should not replace clinicians. Instead, they should be evaluated as supportive tools that extend clinical capabilities while maintaining human oversight," he said.
The review also highlights several gaps in current research, including the lack of international multicenter trials, inconsistent reporting standards, and limited assessment of ethical risks such as hallucinated outputs and data privacy. The authors call for future trials to adopt standardized reporting guidelines and focus on real-world patient outcomes.
Overall, this study provides a timely snapshot of how AI language models are beginning to move from experimental tools to potential clinical assistants in digestive healthcare—while underscoring the urgent need for stronger evidence before widespread adoption.