AI Use in Healthcare May Ignite Unseen Discrimination

University of Copenhagen

Artificial intelligence can help healthcare systems under pressure allocate limited resources, but also lead to more unequal access. This is demonstrated by a research collaboration between the University of Copenhagen, Rigshospitalet and DTU that investigated whether AI can spot the risk of depression equally across different population segments. The research presents options for combing algorithms for bias prior to their deployment.

Artificial intelligence is making sure but steady headway in the healthcare system. MRI scans have already been made more effective by AI and Danish hospitals are now testing AI to make rapid emergency room diagnoses and better cancer diagnoses and therapies. But this is just the beginning.

On August 14, the Danish Minister of the Interior and Health, Sophie Løhde, stated that she envisions a future in which AI relieves beleaguered Danish healthcare system.

In hospitals and psychiatry, one of the tasks that AI is well suited to help improve is the allocation of limited funds through risk analyses and rankings which can ensure, for example, that therapies are provided to the patients where they can be most effective.

AI is already being used in other countries to assess who should receive treatment for depression. It is a development that could be on the way to a Danish mental health system under pressure.

Now, however, University of Copenhagen researchers are calling upon politicians for reflection, so that AI does not lead to more inequality or become an instrument for cold economic calculations. They point out that carelessness could make help a disservice.

"Artificial intelligence has great potential, but we need to be careful because blindly implementing it can distort the healthcare system in new ways that are difficult to see, as the results can appear to be correct at first glance," says Melanie Ganz from the University of Copenhagen's Department of Computer Science and Rigshospitalet.

Invisible discrimination

In a new research article, she, along with her co-authors, documents how hidden biases sneak into an algorithm designed to calculate the risk of depression.

Facts: Depression in Denmark

Depression is a very common debilitating disorder. The Danish Health Authority estimates that roughly 500,000 Danes will suffer from severe depression during their lifetimes. At the same time, there is broad agreement that the Danish psychiatric services lack the resources to meet with the need.

Together with colleagues from the Technical University of Denmark (DTU), the researchers developed the algorithm based upon algorithm design already being used in healthcare systems. Based on actual diagnoses of depression, the algorithm predicts people's risk of developing depression.

"In other countries, it is becoming more and more common to look at how to detect and prevent depression at an early stage. In the US, for example, AI is increasingly being used by private insurers to prioritise resources, a development that will likely come to Denmark in the near future. The question is, how fair will the foundation for such a prioritisation actually be," says co-author Sune Holm from the Department of Food and Resource Economics.

The researchers used depression as a case study to investigate how we can evaluate the algorithms that we use both within the healthcare system and elsewhere in society, so that we can identify and adjust problems in time and make algorithms fairer before they are used.

Extra Info: A startup as a scenario

As an experiment, in a hypothetical scenario, the researchers took on the role of a start-up company that approaches Danish municipalities and other authorities with AI solutions to help them prioritize limited funds, e.g. for the healthcare sector.

In Denmark, AI is not yet used as a diagnostic aid for depression, but it exists internationally, and in Denmark we have a tradition of developing tools to support decisions within diagnostics.

"There are already startups in the US that offer AI solutions to analyze and rank depression risk. With our large joint healthcare system, a realistic scenario in Denmark is that periods of staff shortages will call for AI solutions that are able to best prioritize resources, e.g., in psychiatry," says Melanie Ganz.

The researchers' own algorithm was trained on the historical health data of six million Danes, of whom approximately 200,000 had a depression diagnosis.

Facts: How the algorithm works

Historical data on six million Danes from Statistics Denmark and medical registers were used. Of these, approximately 200,000 had a depression diagnosis.

The data contained a number of variables that statistically influence the risk of depression. It included demographic data, such as age, gender, income, whether the person lives alone, is Danish-born or an immigrant, residence, education, civil status and several other factors.

Based on the statistical data and real depression diagnoses made by healthcare professionals, the algorithm then tries to predict whether people are at risk of depression.

By hiding the diagnosis from the AI, the researchers were able to use data from half of the Danes who were diagnosed to train the machine to spot patterns and markers that it could use for its predictions. The other half was used to test whether the algorithm was on target.

"The right algorithms, if properly trained, can become tremendous assets for any municipality with limited resources. But our research shows that if machine learning isn't well managed, it can skew access to treatment, so that some groups are overlooked or even left out," says Melanie Ganz.

The study shows that the algorithm has an easier time spotting the risk of depression in certain groups of citizens than in others based on the variables they are trained in - e.g. education, gender, ethnicity and a number of other variables. Indeed, the algorithm's ability to identify the risk of depression varied by up to 15% between different groups.

"This means that even a region or municipality, which in good faith introduces an algorithm to help allocate treatment options, can distort any such healthcare effort," says Melanie Ganz.

The algorithm can be a measurable success because it allocates resources to those who are in actual need. But at the same time, it could have hidden biases that exclude or deprioritize certain groups, without it being visible to those who are managing it.

At worst, AI systems can become an instrument of cold calculation. The choice of certain algorithms could be used to conceal the prioritisation of resources for certain societal groups over others.

Tool to ensure fair algorithms

Sune Holm points out that AI also presents some fundamental ethical dilemmas.

Facts: Examining fairness

To analyze fairness, the researchers examined the quality of the algorithm on various segments of the population, by calculating the calibration and the discriminating ability of each sub-group separately.

Calibrating a machine learning algorithm is similar to adjusting a thermometer to ensure that it provides accurate measurements. It's about making the algorithm's predictions more reliable by adapting them better to reality.
At the same time, an algorithm, such as one that determines whether an email is spam or not, must be good at distinguishing between different categories. Its ability to discriminate is its talent for recognizing the subtle patterns or characteristics that distinguish one category from another.

The results showed a difference in the quality of the algorithm's predictions of up to 15% across subgroups.

"If we begin using these systems, it will be important to clarify who is responsible for prioritizing resources and individual therapeutic regimens should they be the result of algorithms. Furthermore, it could be difficult for a doctor to explain to a patient why a decision was made if the algorithm itself is incomprehensible," says Sune Holm.

While the research contributes theoretically to an area of machine learning that deals with algorithmic discrimination across groups, the methods are also a robust tool for checking up on the quality of algorithm fairness.

"The methods we've developed can be used as a concrete prescription to evaluate the fairness of algorithms before they are used in, for example, municipalities and regions. In this way, we hope that the research can contribute to having the right tools in place for when the algorithms really make their entry into this area," says Melanie Ganz.

"Both politicians and citizens must be aware not only of the benefits, but also the pitfalls associated with the use of AI. So, one can be critical instead of just "swallowing the pill" without further ado," says Sune Holm.

He believes there may be a need to ensure that the use of an algorithm has a documented positive effect on patients before investing in implementing it. For example, it should be clear how it can add value to the clinical practice that it is a part of.

Extra Info: Legislation on the way

At the end of the year, EU legislation will come into force that sets requirements for algorithms. The researcher points out that such legislation may put a damper on developments. Whenever high-risk AI is involved, as in the health sector, there are documentation and reporting requirements that explain the conclusions and can help ensure the human hand in decisions based on algorithms.

However, according to the researchers, there are so many places in the healthcare system that stand to be improved with artificial intelligence - from workflows, diagnoses and therapies to the monitoring of intensive care patients and much more - that the use of AI must be here to stay.

"Once the legislation comes into force, I think it will slow development somewhat. Then, there will be an adjustment, which will determine a direction. In the slightly longer term, AI will only be used more and more. As such, it is also important that we as researchers help point out any pitfalls so that legislation can take them into account," says Melanie Ganz

About the study

The study is included in and funded under the DTU project: Bias and Fairness in Medicine

The researchers behind the study are:

Eike Petersen, Postdoc at DTU Compute
Melanie Ganz, Associate Professor at the Department of Computer Science, University of Copenhagen and Senior Researcher at Rigshospitalet's Neurobiological Research Unit
Sune Holm, Associate Professor at the Department of Food and Resource Economics, University of Copenhagen.
Holm recently published a study on the issues with AI as the basis for medical decisions that have been discussed.
Aasa Feragen, Professor at DTU Compute.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.