OAK BROOK, Ill. – Algorithms submitted for an AI Challenge hosted by the Radiological Society of North America ( RSNA ) have shown excellent performance for detecting breast cancers on mammography images, increasing screening sensitivity while maintaining low recall rates, according to a study published today in Radiology , the premier journal of the RSNA.
The RSNA Screening Mammography Breast Cancer Detection AI Challenge was a crowdsourced competition that took place in 2023, with more than 1,500 teams participating. The Radiology article details an analysis of the algorithms' performance, led by Yan Chen, Ph.D., a professor in cancer screening at the University of Nottingham in the United Kingdom.
"We were overwhelmed by the volume of contestants and the number of AI algorithms that were submitted as part of the Challenge," Prof. Chen said. "It's one of the most participated-in RSNA AI Challenges. We were also impressed by the performance of the algorithms given the relatively short window allowed for algorithm development and the requirement to source training data from open-sourced locations."
The goal of the Challenge was to source AI models that improve the automation of cancer detection in screening mammograms, helping radiologists work more efficiently, improving the quality and safety of patient care, and potentially reducing costs and unnecessary medical procedures.
RSNA invited participation from teams across the globe. Emory University in Atlanta, Georgia, and BreastScreen Victoria in Australia provided a training dataset of around 11,000 breast screening images, and Challenge participants could also source publicly available training data for their algorithms.
Prof. Chen's research team evaluated 1,537 working algorithms submitted to the Challenge, testing them on a set of 10,830 single-breast exams—completely separate from the training dataset—that were confirmed by pathology results as positive or negative for cancer.
Altogether, the algorithms yielded median rates of 98.7% specificity for confirming no cancer was present on mammography images, 27.6% sensitivity for positively identifying cancer, and a recall rate—the percentage of the cases that AI judged positive—of 1.7%. When the researchers combined the top 3 and top 10 performing algorithms, it boosted sensitivity to 60.7% and 67.8%, respectively.
"When ensembling the top performing entries, we were surprised that different AI algorithms were so complementary, identifying different cancers," Prof. Chen said. "The algorithms had thresholds that were optimized for positive predictive value and high specificity, so different cancer features on different images were triggering high scores differently for different algorithms."
According to the researchers, creating an ensemble of the 10 best-performing algorithms produced performance that is close to that of an average screening radiologist in Europe or Australia.
Individual algorithms showed significant differences in performance depending on factors such as the type of cancer, the manufacturer of the imaging equipment and the clinical site where the images were acquired. Overall, the algorithms had greater sensitivity for detecting invasive cancers than for noninvasive cancers.
Since many of the participants' AI models are open source, the results of the Challenge may contribute to the further improvement of both experimental and commercial AI tools for mammography, with the goal of improving breast cancer outcomes worldwide, Prof. Chen explained.
"By releasing the algorithms and a comprehensive imaging dataset to the public, participants provide valuable resources that can drive further research and enable the benchmarking that is required for the effective and safe integration of AI into clinical practice," she said.
The research team plans to conduct follow-up studies to benchmark the performance of the top Challenge algorithms against commercially available products using a larger and more diverse dataset.
"Additionally, we will investigate the effectiveness of smaller, more challenging test sets with robust human reader benchmarks—such as those developed by the PERFORMS scheme, a UK-based program for assessing and assuring the quality of radiologist performance as an approach for AI evaluation, and compare its utility to that of large-scale datasets," Prof. Chen said.
RSNA hosts an AI Challenge annually, with this year's competition seeking submissions for models that help detect and localize intracranial aneurysms.