Models Learn to Manage Messy Medical Data

Singapore University of Technology and Design

Hospitals do not always have the opportunity to collect data in tidy, uniform batches. A clinic may have a handful of carefully labelled images from one scanner while holding thousands of unlabelled scans from other centres, each with different settings, patient mixes and imaging artefacts. That jumble makes a hard task—medical image segmentation—even harder still. Models trained under neat assumptions can stumble when deployed elsewhere, particularly on small, faint or low-contrast targets.

Assistant Professor Zhao Na from SUTD and collaborators set out to embrace this messiness rather than disregard it. Instead of the usual setup where labelled and unlabelled data are assumed to be drawn from similar distributions, they work in a more realistic scenario called cross-domain semi-supervised domain generalisation (CD-SSDG). In this scenario, the few labelled images come from a single domain, while the abundant unlabelled pool spans multiple, different domains, which is exactly the situation many hospitals face.

Currently, semi-supervised methods typically lean on pseudo-labels. A model trained on the smaller labelled set guesses labels for unlabelled images, then learns from those guesses. When the unlabelled images look quite different from the labelled ones, those guesses skew wrong, and the errors compound.

The researchers' answer is a dual-supervised asymmetric co-training framework, or DAC, where two sub-models learn side by side. They still exchange pseudo-labels, but with a crucial addition: feature-level supervision. Rather than trusting only pixel-wise guesses, each model also nudges the other to align in a richer feature space, encouraging agreement on underlying structure even when style and contrast differ. The sub-models are also given different self-supervised auxiliary tasks—one learns to localise a mixed patch in a CutMix image; the other learns to recognise a patch's rotation. This asymmetry keeps their internal representations diverse, reducing the risk that both models collapse to the same mistakes and sharpening their ability to separate foreground from background.

"As clinicians and engineers, we rarely get to choose neat datasets," said Asst Prof Zhao. "DAC is our way of adding a safety net. When pseudo-labels are brittle, feature-level guidance still anchors the model to stable, domain-invariant cues. The asymmetric tasks then push the two learners to see the data from different angles."

Tested on three benchmark segmentation settings—retinal fundus (optic disc and cup), colorectal polyp images and spinal cord grey matter MRI—DAC consistently generalised better to unseen domains than strong baselines, including methods purpose-built for domain generalisation. Gains were most striking on small or low-contrast structures such as the optic cup, where the team observed double-digit improvements in Dice score over state-of-the-art approaches at low labelled ratios. Crucially, the auxiliary tasks and feature supervision are used only during training, so DAC's inference cost matches that of conventional models.

"What surprised us was the stability," added Asst Prof Zhao. "Even as we reduced the labelled proportion, down to a tenth in some settings, the curve didn't collapse. That gives confidence to hospitals that can label only a small subset each year yet still want models that travel well."

The team's approach is also pragmatic. Feature-level supervision acts as a soft constraint that does not depend on precise pixel-wise labels, which are notoriously noisy under domain shift. The asymmetric tasks, mixed patch localisation and random patch rotation prediction, are simple to implement (one linear head each) and computationally light, yet they diversify the two learners enough to improve pseudo-label quality over time.

The team also mapped out where DAC can be pushed further. Failure cases include fundus images where multiple blood vessels cross the disc, and scenes where the target almost melts into the background. Future work includes vessel-aware augmentation for fundus images and adaptive, multi-view representations that combine multi-scale and frequency-domain cues to sharpen boundaries in low-contrast settings.

"These ingredients are not limited to the three datasets we tested," noted Asst Prof Zhao. "Tumour imaging faces the same twin pressures—expensive annotations and centre-to-centre variation. DAC is immediately applicable there, especially where precise boundaries are clinically important."

While DAC is a training-time recipe rather than a brand-new network, its impact is practical—make better use of unlabelled, cross-centre data without assuming the world is independent and identically distributed. The method also plays well with existing backbones (ResNet-DeepLabv3+ in the current study) and standard optimisers, keeping the path to adoption short.

The team's findings are detailed in the paper " Dual-supervised Asymmetric Co-training for Semi-supervised Medical Domain Generalization ," published in the IEEE Transactions on Multimedia. The researchers report consistent improvements across Fundus, Polyp and self-supervised contrastive graph matching (SCGM) benchmarks, faster training than a leading co-training baseline and no extra cost at deployment.

"Above all, generalisation is the point," said Asst Prof Zhao. "Hospitals want models that behave when the scanner is different, the patient is different, the lighting is different. By supervising not just the labels we can see, but the features that hold across domains, we move one step closer to that goal."

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.