Lookup Step Boosts AI in Medical Diagnosis Coding

The Mount Sinai Hospital / Mount Sinai School of Medicine

New York, NY [September 25, 2025]— A new study from researchers at the Mount Sinai Health System suggests that a simple tweak to how artificial intelligence (AI) assigns diagnostic codes could significantly improve accuracy, even outperforming physicians. The findings, reported in the September 25 online issue of NEJM AI [DOI: 10.1056/AIcs2401161], could help reduce the time doctors spend on paperwork, cut billing errors, and improve the quality of patient records.

"Our previous study showed that even the most advanced AI could produce the wrong codes, sometimes nonsensical ones, when left to guess," says co-corresponding senior author  Eyal Klang, MD , Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "This time, we gave the model a chance to reflect and review similar past cases. That small change made a big difference."

Doctors in the United States spend hours every week assigning ICD codes —alphanumeric strings, used to describe everything from sprained ankles to heart attacks. But large language models, like ChatGPT, often struggle to assign these codes correctly. To address this, the researchers tried a "lookup-before-coding" method that first prompts the AI to describe a diagnosis in plain language and then choose the most fitting code from a list of real-world examples. The approach delivered greater accuracy, fewer mistakes, and performance on par with or better than humans.

The team utilized 500 Emergency Department patient visits at Mount Sinai Health System hospitals. For each case, they fed the physician's note to nine different AI models, including small open-source systems. First, the models generated an initial ICD diagnostic description. Using a retrieval method, each description was matched to 10 similar ICD descriptions from a database of more than 1 million hospital records, along with how often those diagnoses occurred. In a second step, the model used this retrieved information to select the most accurate ICD description and code.

Emergency physicians and two independent AI systems evaluated the coding results independently, without information about whether the codes were generated by AI or clinicians.

Across the board, models that used the retrieval step outperformed those that didn't, and even did better than physician-assigned codes in many cases. Surprisingly, even small open-source models performed well when allowed to "look up" examples.

"This is about smarter support, not automation for automation's sake," says co-corresponding senior author Girish N. Nadkarni, MD, MPH , Chair of the  Windreich Department of Artificial Intelligence and Human Health , Director of the  Hasso Plattner Institute for Digital Health , and Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer for the Mount Sinai Health System. "If we can cut the time our physicians spend on coding, reduce billing errors, and improve the quality of our data, all with an affordable and transparent system, that's a big win for patients and providers alike."

The authors emphasize that this retrieval-enhanced method is designed to support, not replace, human oversight. While it's not yet approved for billing and was tested specifically on primary diagnosis codes from emergency visits discharged home, it shows encouraging potential for clinical use. The researchers see immediate uses, such as suggesting codes in electronic records or flagging errors before billing.

The investigators are now integrating the method into Mount Sinai's electronic health records system for pilot testing. They hope to expand it to other clinical settings and to include secondary and procedural codes in future versions.

"The big picture here is AI's potential to transform how we care for patients. When technology relieves the administrative burden of our physicians and other providers, they have more time for direct patient care. That's good for clinicians, that's good for patients and it's good for health systems of every size," says David L. Reich MD , Chief Clinical Officer of the Mount Sinai Health System and President of The Mount Sinai Hospital. "Using AI in this way improves our ability to provide attentive and compassionate care by spending more time with patients. This strengthens the foundation of hospitals and health systems everywhere."

The paper is titled "Assessing Retrieval-Augmented Large Language Models for Medical Coding."

The study's authors, as listed in the journal, are Eyal Klang, Idit Tessler, Donald U. Apakama, Ethan Abbott, Benjamin S Glicksberg, Arnold Monique, Akini Moses, Ankit Sakhuja, Ali Soroush, Alexander W. Charney, David L. Reich, Jolion McGreevy, Nicholas Gavin, Brendan Carr, Robert Freeman, and Girish N Nadkarni.

This work was supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under grant award numbers S10OD026880 and S10OD030463.

For more Mount Sinai artificial intelligence news, visit: https://icahn.mssm.edu/about/artificial-intelligence .

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.