Mount Sinai researchers have published one of the first studies using a machine learning technique called “federated learning” to examine electronic health records to better predict how COVID-19 patients will progress. The study was published in the Journal of Medical Internet Research – Medical Informatics on January 27.
The researchers said the emerging technique holds promise to create more robust machine learning models that extend beyond a single health system without compromising patient privacy. These models, in turn, can help triage patients and improve the quality of their care.
Federated learning is a technique that trains an algorithm across multiple devices or servers holding local data samples but avoids clinical data aggregation, which is undesirable for reasons including patient privacy issues. Mount Sinai researchers implemented and assessed federated learning models using data from electronic health records at five separate hospitals within the Health System to predict mortality in COVID-19 patients. They compared the performance of a federated model against ones built using data from each hospital separately, referred to as local models. After training their models on a federated network and testing the data of local models at each hospital, the researchers found the federated models demonstrated enhanced predictive power and outperformed local models at most of the hospitals.
“Machine learning models in health care often require diverse and large-scale data to be robust and translatable outside the patient population they were trained on,” said the study’s corresponding author, Benjamin Glicksberg, PhD, Assistant Professor of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, and member of the Hasso Plattner Institute for Digital Health at Mount Sinai and the Mount Sinai Clinical Intelligence Center. “Federated learning is gaining traction within the biomedical space as a way for models to learn from many sources without exposing any sensitive patient data. In our work, we demonstrate that this strategy can be particularly useful in situations like COVID-19.”
Machine learning models built within a hospital are not always effective for other patient populations, partially due to models being trained on data from a single group of patients which is not representative of the entire population.
“Machine learning in health care continues to suffer a reproducibility crisis,” said the study’s first author, Akhil Vaid, MD, postdoctoral fellow in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, and member of the Hasso Plattner Institute for Digital Health at Mount Sinai and the Mount Sinai Clinical Intelligence Center. “We hope that this work showcases benefits and limitations of using federated learning with electronic health records for a disease that has a relative dearth of data in an individual hospital. Models built using this federated approach outperform those built separately from limited sample sizes of isolated hospitals. It will be exciting to see the results of larger initiatives of this kind.”