In a collaboration between Tianjin University and the Chinese University of Hong Kong, researchers led by Xiangbin Teng used behavioral and brain activity measures to explore whether people can discern between AI-generated and human speech. The researchers also assessed whether brief training improves this ability. This work is published in eNeuro.
Thirty participants listened to sentences spoken by people or AI-generated voices and judged whether the speakers were human or AI before and after short training. The researchers discovered that study participants were bad at discriminating between the two types of speakers, and that training helped only minimally. However, on a neural level, training made the brain's responses more distinct for human versus AI speech. What might that mean? Says Teng, "The auditory brain system seems to start picking up subtle acoustic differences, even if people can't reliably turn that into a behavioral decision yet. That's encouraging—it suggests training can help, and it's a promising starting point for building better ways to distinguish deepfake speech from real human speech. Humans are still adapting to AI-generated content so poor performance doesn't mean the signals aren't there—it may mean we're not yet using the right cues."