Hospitals, clinics, universities and other health-focused organizations routinely collect data on everything from spinal scans to sleep study results - but much of that valuable intelligence stays tucked away in-house.
It's a missed opportunity for researchers employing artificial intelligence and other data analysis tools to improve health outcomes for patients.

"Many organization collect data," says David Rotenberg, chief analytics officer at the Centre for Addiction and Mental Health (CAMH). "But even when it's high quality, it often remains locked away and can be difficult to share. That limits what we can learn from it."
Enter the Health Data Nexus (HDN), a cornerstone offering of the University of Toronto's Temerty Centre for AI Research and Education in Medicine (T-CAIREM) , part of the Temerty Faculty of Medicine. The health database repository offers a safe, secure way to share data that's been stripped of personal patient information. It's also straightforward to access - for those with academic or research credentials - and is organized to be read easily by AI algorithms.
In short, the HDN is a silo-busting, open-source home for health data that's poised to help solve AI's old "garbage in, garbage out" problem.
"When we connect data across institutions, we can discover insights no single team could find alone," says Rotenberg, who is also infrastructure co-lead at T-CAIREM. "We are working on an open science basis to advance medicine and advance how AI can be applied in medicine."
T-CAIREM launched as a research centre in December 2020, focusing on the three pillars of research, education and data infrastructure, with a data platform proposed to fulfill the latter pillar. Six months later, HDN launched with three datasets.
"The first year-and-a-half was laying the groundwork, with privacy impact assessments, threat risk assessments, getting the initial governance and documentation settled," says January Adams, who runs the HDN as data governance and quality analyst for T-CAIREM.
Indeed, the repository has extensive data governance policies around information, ethics, consent and sharing.
Adams says HDN got its first big test in 2023 with a two-day datathon that saw about 40 researchers and students ask questions of the nexus's flagship dataset, which is from the general internal medicine ward at St. Michael's Hospital, Unity Health Toronto. The set includes 22,000 encounters for 14,000 unique patients over eight years, tracking transfers, deaths, discharges and other outcomes.
The HDN has since grown to 10 datasets - and Rotenberg says the team hopes to add five more this year.
With the recent publication of a journal article and a growing calendar of events, the team hopes to build awareness of the HDN while continuing to expand its scope.
"We're moving quickly to grow the Nexus, but awareness is key. We want researchers to know: this is your go-to place for AI-ready health data," he says.
HDN is not the only health data repository available to researchers. PhysioNet , set up by the National Institutes of Health in 1999, is run out of the Massachusetts Institute of Technology (MIT). (Adams says she has regular meetings with the team behind PhysioNet, to share ideas about infrastructure and regulations.) Nightingale Open Science , run by the University of Chicago's business school, houses medical imaging.
But Rotenberg says HDN is unique in its scope. "Our datasets span the full spectrum of medicine - wearables, ultrasound, voice, text, imaging - bringing together diverse health information in one place. That diversity is what allows AI to uncover patterns across disciplines, leading to breakthroughs that wouldn't be possible within a single specialty."
Credentialed researchers can sign up to access the databases on the HDN after completing an online training course on research ethics. They can then mine HDN information, using it on its own or to enrich their own data - even work with remote partners. "You can cross-reference datasets, compare results, and collaborate more easily-without your partners having to navigate endless barriers to access," says Rotenberg.
The T-CAIREM team plans to continue improving the repository and is working to support institutions in adding their own datasets. It offers $50,000 grants to help researchers get their data ready.
"It's a matter of getting it into a format that is usable and valuable, that is machine readable so these models can interface with it well," says Adams.
Along with offering material for health studies, the repository is showing promise as a teaching tool; it's being used in a U of T graduate data science course by Azadeh Kushki, a senior scientist at Holland Bloorview and an associate professor at the Institute of Biomedical Engineering.
As governments south of the border have been limiting data collection and access while AI algorithms increasingly offer promise for better understanding human health, Rotenberg says the need for better data solutions has never been greater - and the HDN can help. "It's a uniquely Canadian model - secure, collaborative, and built on trust - that's changing how we interact with data and accelerating discoveries that benefit people everywhere."