Peishun Liu
Voice recognition is a biometric identity authentication technology, also known as speaker recognition. Its theoretical basis is that each voice has a unique feature, which can effectively distinguish the voices of different people. Compared with other biometric technology, voice recognition has no loss and forgetting, does not need to remember, easy to use, and does not involve privacy, therefore, users are easy to accept this technology. Voice recognition can be widely used in security verification, control and other aspects, especially identity recognition in telecommute application scenarios. In this paper, a GAN-based timbre conversion system is studied and implemented. The forged audio generated by the timbre conversion is used to successfully attack the speaker recognition system. Firstly, according to the GAN, combined with the VCC2016 voice data set, a model which can transform audio files between different timbre is obtained. The model can extract the key features of the audio files from the data set, including Linear Predictive Cepstrum Coefficients (LPCC), MEL Frequency Cepstral Coefficients (MFCC). By means of short-time Fourier transform, Characteristic parameters get different audio feature, so the GAN makes the network master these laws between the data, thus it may carry on the fitting of audio features to generate the required forged audio files. Through the verification of the existing main speaker recognition algorithms, the method proposed in this paper can effectively attack the existing main speaker recognition technologies, which proves that the security of the current recognition system is defective, and it is urgent to improve its security. (Up to 250 words)
Share this article