AI Boosts Chest CT Diagnoses, Speeds Surgical Calls

FAR Publishing Limited

Interpreting the fine print of a chest CT report can make or break a patient's surgical plan, yet radiologists worldwide face ballooning workloads and widening expertise gaps. A new study from Zhujiang Hospital of Southern Medical University analyzed 13,489 real-world chest CT reports and found that state-of-the-art LLMs can shoulder much of that burden—when asked the right way.

''We discovered that modern language models can act as a dependable second set of eyes for radiologists,'' said Dr. Peng Luo, lead author and physician at Zhujiang Hospital. ''With carefully worded multiple-choice prompts, GPT-4 reached a 75 percent accuracy rate across 13 common chest diseases, ranging from COPD to aortic atherosclerosis.''

The team compared GPT-4, Claude-3.5-Sonnet, Qwen-Max, Gemini-Pro and GPT-3.5-Turbo using two question styles: open-ended and multiple choice. Across all models, multiple-choice prompts boosted accuracy and consistency, underscoring the power of prompt engineering. GPT-4, Claude-3.5 and Qwen-Max topped the charts, while GPT-3.5-Turbo and Gemini-Pro lagged.

To probe whether weaker models could catch up, the researchers fine-tuned GPT-3.5-Turbo on 200 high-performing cases. ''Fine-tuning turned a 42 percent system into a 65 percent system overnight for tough pulmonary cases,'' Dr. Luo said. ''That's a game-changer for hospitals that rely on cost-effective models."

Beyond raw accuracy, the study evaluated each model's area under the ROC curve (AUC) for every disease. GPT-4 excelled at gallstone and pleural effusion detection, while Qwen-Max showed unusual strength in COPD discrimination. However, no single model dominated every condition, suggesting a tailored, disease-specific deployment strategy.

The authors caution that LLM outputs still require expert oversight, especially when a model expresses high confidence in borderline cases. Future work will integrate explainable-AI tools to reveal how models weigh radiologic clues and to set dynamic confidence thresholds.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.