HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

Author	Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang
Publication	INTERSPEECH
Year	2023
Link	[Paper] [arXiv]

ABSTRACT

This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style encoder-decoder framework, we attach an additional decoder so that each decoder network performs noise suppression or restoration separately. We carefully design each decoder architecture to operate appropriately depending on its objectives. Additionally, we improve performance by leveraging a learnable weighting factor, aggregating the two decoder output waveforms. Experimental results with objective metrics across various environments clearly demonstrate the effectiveness of our approach over a single decoder or multi-stage systems for general speech restoration task.

Share on

Twitter Facebook LinkedIn

Soo-Whan Chung

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

Share on

You may also enjoy

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition