Physiological-Physical Feature Fusion Detects Voice Spoofing

Higher Education Press

Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks. These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system. Researchers have made many efforts to address this problem. But existing voice spoofing detection methods only consider the physical features of speech, resulting in poor detection performance.

To solve the problem, a research team led by Junxiao XUE published their new research on 15 April 2023 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposes a voice spoofing detection method based on physiological-physical feature fusion. The method includes a feature extractor, a densely connected convolutional neural network with squeeze and excitation blocks (SE-DenseNet), and a feature fusion strategy. Compared to existing methods, the tandem decision cost function and equal error rate scores improved by 5% and 7% respectively.

Specifically, physiological features in the audio are first extracted from a pre-trained convolutional network. SE-DenseNet is then used to extract the physical features. Such a densely connected model has high parametric efficiency and squeeze and excitation blocks enhances the efficiency of feature transmission. Finally, the two features are integrated into the classification network for voice spoofing detection.

They compared the proposed model with some of the best single systems. The experiments show that their proposed model performs better on both EER and t-DCF. To validate the effectiveness of the face features, they also evaluated the performance of some baseline models that introduced face features. It was found that different baseline methods showed different degrees of performance improvement when combined with the face features, proving that the face features are practicable for the baseline models.

Future work can attempt to extract more accurate face features and study more effective feature fusion strategies to detect spoofing attacks.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.