Clinical interviewing is one of the most important skills physicians develop during their training. It forms the foundation for accurate diagnosis and effective patient care. However, evaluating these skills is often time-intensive, requiring repeated observations and detailed feedback from experienced clinicians. As medical education continues to expand, this growing assessment burden has become a significant challenge. The incorporation of generative artificial intelligence (AI) has the potential to significantly improve the assessment of interviewing skills; however, its efficiency compared to standard evaluation systems is not well understood.
To fill this gap, researchers from Japan explored whether artificial intelligence could help address this issue by evaluating medical interview transcripts. Their findings were published on February 17, 2026 in Volume 12 of the journal JMIR Medical Education . The research team led by Dr. Hiromizu Takahashi (corresponding author) and Professor Toshio Naito, both from the Department of General Medicine, Juntendo University Faculty of Medicine, Japan, examined whether AI-based assessment (ABA) could match traditional human-based assessment (HBA).
"Our central message is that AI may help make medical training fairer, faster, and more scalable," explains Prof. Naito.
To evaluate ABA vs HBA systems, the researchers designed a cross-sectional validation study using a virtual patient system. Seven participants, including medical students, resident physicians, and attending physicians, conducted clinical interviews with an AI-simulated patient presented with bilateral leg weakness. These conversations were automatically recorded and converted into transcripts. The transcripts were then evaluated using the Master Interview Rating Scale, a standardized tool that assesses various aspects of clinical communication, such as information gathering, organization, and empathy. For the ABA system, AI models, specifically GPT-o1 Pro and GPT-5 Pro, were used to assess the transcripts. On the other hand, five experienced clinical instructors independently evaluated the same transcripts comprising the HBA approach.
According to the researchers, ABA showed strong agreement with clinician evaluations, with only minimal differences in scores. At the same time, AI demonstrated greater consistency across repeated evaluations. Importantly, the use of AI also reduced the time required to assess each transcript by more than half, highlighting its potential to ease the workload of educators. "Rather than replacing teachers, this research suggests a practical 'AI-first, faculty-verified' model in which AI handles the first pass and educators focus their time on coaching, judgment, and high-stakes decisions," says Dr. Takahashi.
These results have important implications for medical education. In many training programs, delays in feedback can limit opportunities for students to improve their communication skills. By providing rapid and consistent evaluations, AI could make repeated practice more accessible, particularly in settings with limited faculty resources. "Students could interview an AI-simulated patient and receive feedback almost immediately instead of waiting days or weeks," Prof. Naito adds, highlighting the potential for more timely learning experiences.
At the same time, the researchers emphasize that AI should be used with care. While AI performed well in this study, it was based on a small number of participants and a single clinical scenario. In addition, transcript-based evaluation cannot capture nonverbal cues, tone, or cultural nuances that are often important in real-world patient interactions. Prof. Naito and Dr. Takahashi note with caution, "AI should be used with human oversight, because text-only scoring can miss nuances such as tone, nonverbal communication, and cultural context."
Overall, this study highlights the growing role of AI in medical education. By combining the speed and consistency of AI with the expertise and judgment of clinicians, it may be possible to create more efficient and scalable training systems. As the demand for high-quality medical education continues to rise, such approaches could help ensure that future clinicians receive the best training while reducing the burden on educators.