A Study on Speech Disentanglement Framework based on Adversarial Learning for Speaker Recognition
Author | Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang |
Publication | The Journal of the Acoustical Society of Korea |
Volume | 39 |
Issue | 5 |
Page | 447-453 |
Month | September |
Year | 2020 |
Link | [Paper] |
ABSTRACT
In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.