Posts by Category

Conference

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

INTERSPEECH, 2024

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

INTERSPEECH, 2023

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

INTERSPEECH, 2023

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

An Empirical Study on Speech Restoration Guided by Self-supervised Speech Representation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Diffusion-based Generative Speech Source Separation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

INTERSPEECH, 2022

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

INTERSPEECH, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion

Odyssey, 2022

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Look Who’s Talking: Active Speaker Detection in the Wild

INTERSPEECH, 2021

Looking Into Your Speech: Learning Cross-Modal Affinity for Audio-Visual Speech Separation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

End-To-End Lip Synchronisation Based on Pattern Classification

IEEE Spoken Language Technology Workshop (SLT), 2021

MIRNet: Learning Multiple Identities Representations in Overlapped Speech

INTERSPEECH, 2020

Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework

INTERSPEECH, 2020

FaceFilter: Audio-Visual Speech Separation using Still Images

INTERSPEECH, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision

INTERSPEECH, 2020

Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

A Study on Search Grid Points for Data-Driven 3-D Beamsteering

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017

Back to top ↑

Audiovisual

Look Who’s Talking: Active Speaker Detection in the Wild

INTERSPEECH, 2021

Looking Into Your Speech: Learning Cross-Modal Affinity for Audio-Visual Speech Separation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

End-To-End Lip Synchronisation Based on Pattern Classification

IEEE Spoken Language Technology Workshop (SLT), 2021

FaceFilter: Audio-Visual Speech Separation using Still Images

INTERSPEECH, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision

INTERSPEECH, 2020

Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval

Journal of Selected Topics in Signal Processing, 2020

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Back to top ↑

Journal

A Study on Speech Disentanglement Framework based on Adversarial Learning for Speaker Recognition

The Journal of the Acoustical Society of Korea, 2020

Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval

Journal of Selected Topics in Signal Processing, 2020

Generic Uniform Search Grid Generation Algorithm for Far-field Source Localization

The Journal of the Acoustical Society of America, 2018

Back to top ↑

Speech separation

Diffusion-based Generative Speech Source Separation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

MIRNet: Learning Multiple Identities Representations in Overlapped Speech

INTERSPEECH, 2020

FaceFilter: Audio-Visual Speech Separation using Still Images

INTERSPEECH, 2020

Back to top ↑

Source localization

Generic Uniform Search Grid Generation Algorithm for Far-field Source Localization

The Journal of the Acoustical Society of America, 2018

A Study on Search Grid Points for Data-Driven 3-D Beamsteering

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017

Back to top ↑

Multimodal Learning

Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval

Journal of Selected Topics in Signal Processing, 2020

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Back to top ↑

Speaker representation

MIRNet: Learning Multiple Identities Representations in Overlapped Speech

INTERSPEECH, 2020

Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework

INTERSPEECH, 2020

Back to top ↑

Speech disentanglement

A Study on Speech Disentanglement Framework based on Adversarial Learning for Speaker Recognition

The Journal of the Acoustical Society of Korea, 2020

Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework

INTERSPEECH, 2020

Back to top ↑

Speech enhancement

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

INTERSPEECH, 2023

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Back to top ↑

Speech restoration

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

INTERSPEECH, 2023

An Empirical Study on Speech Restoration Guided by Self-supervised Speech Representation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Microphoone array

A Study on Search Grid Points for Data-Driven 3-D Beamsteering

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017

Back to top ↑

Multi-channel

Generic Uniform Search Grid Generation Algorithm for Far-field Source Localization

The Journal of the Acoustical Society of America, 2018

Back to top ↑

Active learning

Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Back to top ↑

Speech recognition

Gradient-based Active Learning Query Strategy for End-to-end Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Back to top ↑

Cross-modal retrieval

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision

INTERSPEECH, 2020

Back to top ↑

Speaker recognition

A Study on Speech Disentanglement Framework based on Adversarial Learning for Speaker Recognition

The Journal of the Acoustical Society of Korea, 2020

Back to top ↑

Synchronisation

End-To-End Lip Synchronisation Based on Pattern Classification

IEEE Spoken Language Technology Workshop (SLT), 2021

Back to top ↑

Separation

Looking Into Your Speech: Learning Cross-Modal Affinity for Audio-Visual Speech Separation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Back to top ↑

Active speaker detection

Look Who’s Talking: Active Speaker Detection in the Wild

INTERSPEECH, 2021

Back to top ↑

Phase reconstruction

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Back to top ↑

Anti-spoofing

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion

Odyssey, 2022

Back to top ↑

Speaker Verification

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion

Odyssey, 2022

Back to top ↑

SASV Challenge

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

INTERSPEECH, 2022

Back to top ↑

Crossmodal

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

INTERSPEECH, 2022

Back to top ↑

Keyword spotting

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

INTERSPEECH, 2022

Back to top ↑

Diffusion

Diffusion-based Generative Speech Source Separation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Self-supervised learning

An Empirical Study on Speech Restoration Guided by Self-supervised Speech Representation

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Multi-lingual

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Mixture-of-experts

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Audio-visual

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Speech synthesis

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

Back to top ↑

Pitch estimation

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

INTERSPEECH, 2023

Back to top ↑

Periodicity

MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

INTERSPEECH, 2023

Back to top ↑

Speech generation

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

INTERSPEECH, 2024

Back to top ↑

Acoustic scene transfer

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

INTERSPEECH, 2024

Back to top ↑