Lung cancer remains one of the leading causes of cancer-related deaths worldwide, making early and accurate diagnosis essential for improving patient outcomes. Chest computed tomography (CT) scans are widely used to detect pulmonary nodules, but interpreting these images requires careful assessment of multiple features, including shape, margins, and internal structure. This process is often complex and highly dependent on a radiologist's expertise, leading to variability in diagnosis. Although artificial intelligence (AI) has improved classification accuracy in recent years, most systems still provide only binary outputs, offering limited insight into how conclusions are reached.
To address this challenge, a research team led by Ms. Maiko Nagao, a graduate student from the Graduate School of Science and Technology, Meijo University, Japan, along with Professor Atsushi Teramoto from the Faculty of Information Engineering, Graduate School of Science and Technology, Meijo University, Japan, and Professor Hiroshi Fujita from the Faculty of Engineering, Gifu University, Japan, and Professor Kazuyoshi Imaizumi from the School of Medicine, Fujita Health University, Japan, developed a novel diagnostic support framework based on visual question answering (VQA), enabling interactive generation of medical findings from chest CT images. Their findings were published in the International Journal of Computer Assisted Radiology and Surgery on March 27, 2026.
"Conventional AI diagnostic support methods lacked explainability because they mainly focused on classifying lesions as benign or malignant from medical images. This made it difficult for clinicians to interpret and utilize the results. Thus, our goal was to generate findings similar to those written by physicians to improve the usability and acceptance of AI outputs," says Ms. Nagao.
The team constructed a dedicated dataset by leveraging structured annotations from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). These annotations describe key morphological characteristics of pulmonary nodules, such as sphericity, margin, texture, lobulation, spiculation, and calcification. By converting this structured data into natural language descriptions and pairing them with relevant clinical questions, the researchers created a training framework that connects images, questions, and findings. The vision-language model was then fine-tuned to generate responses based on both visual input and physician-driven queries.
The results demonstrated that the system could produce clinically meaningful and linguistically natural image findings. Quantitative evaluations showed strong agreement with reference descriptions, including a high CIDEr score of 3.896, indicating accurate and contextually relevant outputs. Importantly, the model maintained consistency across key morphological features, supporting reliable interpretation of pulmonary nodules. Unlike conventional AI systems, this approach allows physicians to ask targeted questions—such as those related to shape or internal structure—and receive detailed, explainable answers.
"Our study demonstrates a technology that enables interactive exploration of lesion characteristics and generation of findings from CT images and can serve as a learning tool for physicians, thereby reducing variability in diagnoses," says Prof. Teramoto.
The study also highlights the importance of explainability in medical AI. By presenting findings in a question-and-answer format, the system provides transparency into the diagnostic reasoning process, making it easier for clinicians to trust and utilize AI-generated results. This interactive capability can support report writing, enhance training for medical professionals, and reduce inconsistencies across diagnoses.
Motivated by a personal experience, Ms. Nagao emphasized the real-world impact of the work, "Some patients cannot undergo invasive tests, such as biopsies, because of other medical conditions. This inspired me to develop a method that supports diagnosis without adding a burden to patients. By making AI outputs more understandable and interactive, we hope to improve both clinical decision-making and patient care."
Overall, this research introduces a new paradigm for AI-assisted diagnosis by combining structured medical knowledge with interactive language-based reasoning. In the short term, the system may help standardize diagnostic practices and assist clinicians in interpreting complex CT images. Over the longer term, it could enable a more transparent, collaborative, and data-driven healthcare system, paving the way for advanced diagnostic tools that integrate seamlessly into clinical workflows.
About Meijo University
Meijo University traces its origin back to the establishment of the Nagoya Science and Technology Course in 1926, giving it a proud history of more than 90 years. As one of the largest universities in the Chubu region, Meijo University is a comprehensive learning institution that supports a wide range of academic fields from the humanities to physical sciences. With a network of more than 200,000 graduates and alumni, it strives to contribute not only to local industries but also to international communities in various fields. Meijo University is also known as the birthplace of the carbon nanotube. To foster the human resources of the next generation, the university continues to tackle ongoing challenges by further enhancing its campus and creating new faculties.
Website: https://www.meijo-u.ac.jp/english/
About Ms. Maiko Nagao from Meijo University
Ms. Maiko Nagao is a graduate student from the Graduate School of Science and Technology, Meijo University, Japan. Her work primarily focuses on using deep learning and vision-language models to improve the diagnosis of lung cancer and other medical conditions.
Funding information
This study was supported in part by a Grant-in-Aid for Scientific Research (No. 23K07117).