Global Study Uncovers Bias in AI Skin Disease Diagnosis

Health Data Science

An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has recently published groundbreaking findings in the journal Health Data Science, highlighting biases in multimodal large language models (LLMs) such as ChatGPT-4 and LLaVA in diagnosing skin diseases from medical images. The study systematically evaluated these AI models across different sex and age groups.

Utilizing approximately 10,000 dermatoscopic images, the study focused on three common skin diseases: melanoma, melanocytic nevi, and benign keratosis-like lesions. Results revealed that while ChatGPT-4 and LLaVA outperformed most traditional deep learning models overall, ChatGPT-4 showed greater fairness across demographic groups, whereas LLaVA exhibited significant sex-related biases.

Dr. Wan emphasized, "While large language models like ChatGPT-4 and LLaVA demonstrate clear potential in dermatology, we must address the observed biases, particularly across sex and age groups, to ensure these technologies are safe and effective for all patients."

The team plans further research incorporating additional demographic variables like skin tone to comprehensively evaluate the fairness and reliability of AI models in clinical scenarios. This research provides critical guidance for developing more equitable and trustworthy medical AI systems.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.