AI Tool Is Trying To Predict Your Risk Of Getting Many Diseases Years In Advance - Here's How It Works

Being able to instantly and accurately predict the trajectory of a person's health in the years to come has long been seen as the pinnacle of medicine. This kind of information would have a profound effect on healthcare systems as a whole - shifting care from treatment to prevention.

Authors

  • Natalia Levina

    Professor, Department of Information Systems Management & Analytics, Warwick Business School, University of Warwick; New York University

  • Hila Lifshitz-Assaf

    Professor of Management, Warwick Business School, University of Warwick

  • João Sedoc

    Assistant Professor of Technology, Operations, and Statistics, New York University

According to the findings of a recently published paper , researchers are promising just that. Using cutting-edge artificial intelligence (AI) technology, the researchers built Delphi-2M. This tool is seeking to predict a person's next health event and when it's likely to happen in the next 20 years. The model does this for a thousand different diseases including cancer, diabetes and heart disease.

To develop Delphi-2M, the European research team used data from nearly 403,000 people from the UK Biobank as an input into the AI model.

In the final trained AI model, Delphi-2M predicted the next disease and when it would occur based on a person's sex at birth, their body mass index, whether they smoked or drank alcohol, and their timeline of prior diseases.

It was able to make these predictions with a 0.7 AUC (area under the curve). AUC aggregates false positive and false negative rates, so can be used as a proxy for accuracy in a theoretical setting. This means the model's predictions could be interpreted to have about 70% accuracy across all disease categories - although the accuracy of these predictions have not yet been tested in terms of real-world outcomes.

They then applied the model to Danish Biobank data to see whether it was still effective. It was able to predict health outcomes with similar theoretical accuracy rates.

AI tools

The purpose of the paper wasn't to suggest the Delphi-2M is ready to be used by doctors or in the medical field. Rather, it was to illustrate the power of the team's proposed AI architecture, and the benefit it could have in analysing medical data.

Delphi-2M uses a "transformer network" to make its predictions. This is the same technology architecture that powers ChatGPT. The researchers modified the GPT2 transformer architecture to use time and disease features to predict when and what will happen.

Although other health prediction models have used transformer networks in the past, these were only designed to make predictions about a person's risk of developing a single disease . Plus, they were primarily used on smaller-scale hospital record data.

But transformer networks are particularly well-suited for predicting a person's risk of multiple diseases. This is because they can adapt their focus easily and are able to work out complex interactions between many different diseases from multiple distinct data points.

Delphi-2M has also proven to be slightly more accurate than other multi-disease prediction models which use a different architecture.

For example, Milton uses a combination of standard machine learning techniques and applied them to the same UK Biobank data. This model showed somewhat lower predictive power for most diseases compared with Delphi-2M - and needed to use more data to do so.

Moreover, non-transformer models are hard for others to improve by adding more data layers. This means these models cannot be as easily adapted and improved upon as transformer models for use in different contexts and studies.

What's special about the Delphi-2M model is that it can be released to the public as an open-source model without compromising patients' privacy. The authors were able to create synthetic data that mimics the UK Biobank data while removing personally identifiable information - all without a significant drop in predictive power. Moreover, Delphi-2M requires less computing resources to train than typical AI transformer models .

This will allow other researchers to train the model from scratch and possibly tailor the model and information for their needs. This is important for the advancement of open science and is generally difficult to do in medical settings.

Still too early

Whether or not Delphi-2M becomes the foundation model for AI tools that are designed to predict a patient's future health risks, it demonstrates that models such as this are on the way.

Due to its layered architecture and open-source nature, future models similar to Delphi-2M will continue to evolve by incorporating even richer data - such as electronic health records, medical images, wearable technologies and location data. This would improve its predictive powers and accuracy over time.

But while the ability to prevent diseases and provide early diagnosis holds great promise, there are a few key caveats when it comes to this predictive tool.

First, there are numerous data-related concerns associated with such tools. As we have written before , the quality of data and training that an AI tool receives makes or breaks its predictions.

The UK Biobank dataset used to create Delphi-2M didn't have sufficient data on diverse races and ethnic groups to allow for in-depth training and performance analysis.

While some analysis was performed by the Delphi-2M researchers to show that adding ethnicity and race didn't sway the results too much, there was still insufficient data in many categories to even conduct the assessment.

If ever used in the real world, personal healthcare data will probably be used and layered on top of foundation models such as Delphi-2M. While the inclusion of this personal data will improve prediction accuracy, it also comes with risks - for example, around personal data security and out-of-context use of the data.

It may also be difficult to scale the model to countries whose healthcare systems differ from those that are used to design the dataset. For instance, it may be harder to apply Delphi-2M to the US context, where healthcare data is spread around multiple hospital systems and private clinics.

At present, it's too early for Delphi-2M to be used by patients or doctors. While Delphi-2M provided generalised predictions based on the data that was used to train it, it's too early to use these predictions for personalised health recommendations for an individual patient.

But hopefully, with continued investment into researching and building Delphi-2M-style models, it will someday be possible to input a patient's personal health data into the model and get a personalised prediction.

The Conversation

The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.

/Courtesy of The Conversation. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).