HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
INTERSPEECH
[Abstract] This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unl...
INTERSPEECH
[Abstract] This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unl...
INTERSPEECH
[Abstract] We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurate...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. Inspired by the na...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additi...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] We propose a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored c...
INTERSPEECH (Best Paper Finalist )
[Abstract] In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speec...
INTERSPEECH
[Abstract] The first spoofing-aware speaker verification (SASV) challenge aims to integrate research efforts in speaker verification and anti-spoofing. We ext...
Odyssey
[Abstract] Deep learning has brought impressive progress in the study of both automatic speaker verification (ASV) and spoofing countermeasures (CM). Although...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or impli...
INTERSPEECH
[Abstract] In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
[Abstract] In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing. Most conventional a...
IEEE Spoken Language Technology Workshop (SLT)
[Abstract] The goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on...
The Journal of the Acoustical Society of Korea
[Abstract] In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fac...
INTERSPEECH
[Abstract] Many approaches can derive information about a single speaker’s identity from the speech by learning to recognize consistent characteristics of aco...
INTERSPEECH
[Abstract] In this paper, we propose an effective training strategy to extract robust speaker representations from a speech signal. One of the key challenges ...
INTERSPEECH (Best Paper Award )
[Abstract] The objective of this paper is to separate a target speaker’s speech from a mixture of two speakers using a deep audio-visual speech separation net...
INTERSPEECH
[Abstract] The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervis...
Journal of Selected Topics in Signal Processing
[Abstract] This paper proposes a new strategy for learning effective cross-modal joint embeddings using self-supervision. We set up the problem as one of cros...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] In this paper, we propose an effective active learning query strategy for an automatic speech recognition system with the aim of reducing the train...
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[Abstract] This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronisation. Here, we set up the problem as ...
The Journal of the Acoustical Society of America
[Abstract] In this letter, a generic search grid generation algorithm for far-field source localization (SL) is proposed. Since conventional uniform regular g...
2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)
[Abstract] This paper presents a 3-D search grid allocation algorithm for source localization and beam steering. Data-driven beam-steering systems can benefit...