Researchers from Australia's national science agency CSIRO, Federation University Australia and RMIT University have developed a method to improve the detection of audio deepfakes.
The new technique, Rehearsal with Auxiliary-Informed Sampling (RAIS), is designed for audio deepfake detection — a growing threat in cybercrime risks such as bypassing voice-based biometric authentication systems, impersonation and disinformation. It determines whether an audio clip is real or artificially generated (a 'deepfake') and maintains performance over time as attack types evolve.
In Italy earlier this year, an AI-cloned voice of its Defence Minister requested a €1M 'ransom' from prominent business leaders, convincing some to pay. This is just one of many examples, highlighting the need for audio deepfake detectors.
As deepfake audio technology advances rapidly, newer 'fake techniques' often look nothing like the older ones.
"We want these detection systems to learn the new deepfakes without having to train the model again from scratch. If you just fine-tune on the new samples, it will cause the model to forget the older deepfakes it knew before," said joint author, Dr Kristen Moore from CSIRO's Data61.
"RAIS solves this by automatically selecting and storing a small, but diverse set of past examples, including hidden audio traits that humans may not even notice, to help the AI learn the new deepfake styles without forgetting the old ones," explained Dr Moore.
RAIS uses a smart selection process powered by a network that generates 'auxiliary labels' for each audio sample. These labels help identify a diverse and representative set of audio samples to retain and rehearse. By incorporating extra labels beyond simple 'fake' or 'real' tags, RAIS ensures a richer mix of training data, improving its ability to remember and adapt over time.
Outperforming other methods, RAIS achieves the lowest average error rate of 1.95 per cent across a sequence of five experiences. It remains effective with a small memory buffer and is designed to maintain accuracy as attacks become more sophisticated.
The code, available on GitHub , remains effective with a small memory buffer and is designed to maintain accuracy as attacks become more sophisticated.
"Audio deepfakes are evolving rapidly, and traditional detection methods can't keep up," said Falih Gozi Febrinanto, a recent PhD graduate of Federation University Australia.
"RAIS helps the model retain what it has learned and adapt to new attacks. Overall, it reduces the risk of forgetting and enhances its ability to detect deepfakes."
"Our approach not only boosts detection performance, but also makes continual learning practical for real-world applications. By capturing the full diversity of audio signals, RAIS sets a new standard for efficiency and reliability," said Dr Moore.
Read and download the full paper: Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection .