Research Reveals ChatGPT's Failure in Brain Abscess Treatment Guidelines

European Society of Clinical Microbiology and Infectious Diseases

With artificial intelligence (AI) poised to become a fundamental part of clinical research and decision making, many still question the accuracy of ChatGPT, a sophisticated AI language model, to support complex diagnostic and treatment processes.

Now a new study, being presented at this year's ESCMID Global Congress (formerly ECCMID) in Barcelona, Spain (27-30 April), which pitted ChatGPT against the ESCMID guideline for the management of brain abscesses, found that while ChatGPT seems able to give recommendations on key questions about diagnosis and treatment in most cases, some of the AI model's responses could put patients at risk.

The study was conducted by members of the ESCMID Study Group for Infectious Diseases of the Brain (ESGIB), and is published in The Journal of Neurology.

"Anything less than 100% is a failure when you're dealing with patient safety", says author Dr Susanne Dyckhoff-Shen from LMU University Hospital Munich in Germany and a member of ESCMID. "While we are amazed by ChatGPT's knowledge on the management of brain abscesses, there are some key limitations when it comes to using the AI model as a medical device, including potential patient harm and the lack of transparency about which data are used to provide responses."

The ability of AI to rapidly assimilate, process, and interpret vast data sets offers tantalising prospects. But are time-consuming processes to create medical guidelines still necessary, or could AI models trained on a wealth of scientific medical literature rival clinical experts in answering complex clinical questions?

Brain abscesses are a potentially life-threatening central nervous system (CNS) infection that require immediate identification and treatment to prevent severe neurological complications and even death.

Historically, the management of brain abscesses has been largely guided by clinical experience and limited studies, but in 2023 ESCMID fulfilled the need for a standardised approach by developing an international guideline [1].

To find out whether ChatGPT is able to professionally evaluate medical research and give scientifically valid recommendations, a European team of researchers tested the AI model to see whether it could accurately provide answers to 10 key questions on brain abscess diagnostics and treatment in comparison to the ESCMID guideline.

First, the researchers asked ChatGPT (version 4) to answer 10 questions that had been developed and appraised by the ESCMID committee for their brain abscess guideline without any additional information.

Then, ChatGPT was additionally primed with the text of the same scientific research articles that were used to develop the guideline before asking the same questions. This was done to see if ChatGPT could provide more aligned recommendations when given the same data used for guideline development.

The AI-generated responses were then compared to the recommendations of the ESCMID guideline by three independent infectious CNS disease experts for their clarity, alignment with the guideline, and patient risk.

Clear responses to most key questions

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.