NTU Speech Processing & Machine Learning Lab

university

https://twitter.com/ntu_spml

Activity Feed Request to join this org

AI & ML interests

Speech Processing, Self-Supervised Learning, ASR, TTS, Voice Conversion, Spoken Question Answering

Recent Activity

dlion168 submitted a paper 9 days ago

Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach

vectominist authored a paper 26 days ago

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

vectominist authored a paper 26 days ago

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

View all activity

submitted a paper to Daily Papers 9 days ago

Speaker Identity in Non-Verbal Vocalizations: Conditional Distillation and Mixture of Experts Approach

Paper • 2606.21215 • Published 16 days ago

authored 2 papers 26 days ago

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

Paper • 2005.01972 • Published May 5, 2020

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

Paper • 2606.06444 • Published about 1 month ago • 3

authored a paper 3 months ago

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Paper • 2603.19195 • Published Mar 19 • 4

submitted a paper to Daily Papers 3 months ago

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Paper • 2603.19195 • Published Mar 19 • 4

submitted a paper to Daily Papers 6 months ago

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation

Paper • 2601.06329 • Published Jan 9 • 2

authored a paper 6 months ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Paper • 2512.19687 • Published Dec 22, 2025 • 3

authored 10 papers 9 months ago

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

Paper • 2408.07665 • Published Aug 14, 2024

EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition

Paper • 2506.04652 • Published Jun 5, 2025 • 1

Multi-Distillation from Speech and Music Representation Models

Paper • 2506.07237 • Published Jun 8, 2025

MMMOS: Multi-domain Multi-axis Audio Quality Assessment

Paper • 2507.04094 • Published Jul 5, 2025

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1

Paper • 2507.09904 • Published Jul 14, 2025

Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative

Paper • 2508.09294 • Published Aug 12, 2025

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Paper • 2505.16220 • Published May 22, 2025 • 1

Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

Paper • 2509.13989 • Published Sep 17, 2025 • 3

CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition

Paper • 2506.06071 • Published Jun 6, 2025 • 1

MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

Paper • 2509.20706 • Published Sep 25, 2025 • 3

authored 3 papers 10 months ago

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

Paper • 2402.13071 • Published Feb 20, 2024 • 1

Towards audio language modeling -- an overview

Paper • 2402.13236 • Published Feb 20, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 5