Dynamic-SUPERB

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

BoJack authored a paper 17 days ago

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

BoJack authored a paper 17 days ago

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

BoJack authored a paper 17 days ago

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

View all activity

authored 4 papers 17 days ago

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Paper • 2312.15185 • Published Dec 23, 2023

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Paper • 2505.13032 • Published May 19, 2025 • 4

Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception

Paper • 2510.12720 • Published Oct 14, 2025 • 2

MMAE: A Massive Multitask Audio Editing Benchmark

Paper • 2606.07229 • Published 20 days ago • 45

submitted a paper to Daily Papers 17 days ago

MMAE: A Massive Multitask Audio Editing Benchmark

Paper • 2606.07229 • Published 20 days ago • 45

authored 3 papers 3 months ago

An Empirical Recipe for Universal Phone Recognition

Paper • 2603.29042 • Published Mar 30 • 5

[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Paper • 2602.18899 • Published Mar 12 • 1

Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces

Paper • 2603.12642 • Published Mar 13 • 1

authored a paper 3 months ago

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Paper • 2603.19195 • Published Mar 19 • 4

submitted a paper to Daily Papers 3 months ago

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Paper • 2603.19195 • Published Mar 19 • 4

submitted a paper to Daily Papers 3 months ago

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Paper • 2603.14636 • Published Mar 15 • 4

authored 3 papers 3 months ago

A Preliminary Exploration with GPT-4o Voice Mode

Paper • 2502.09940 • Published Feb 14, 2025

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

Paper • 2603.09714 • Published Mar 10

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Paper • 2603.14636 • Published Mar 15 • 4

authored 2 papers 4 months ago

BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications

Paper • 2602.09970 • Published Feb 10 • 1

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

Paper • 2302.09437 • Published Feb 18, 2023

authored 4 papers 5 months ago

Wav2Gloss: Generating Interlinear Glossed Text from Speech

Paper • 2403.13169 • Published Mar 19, 2024

TiDAL: Learning Training Dynamics for Active Learning

Paper • 2210.06788 • Published Oct 13, 2022

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Paper • 2406.09282 • Published Jun 13, 2024

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Paper • 2409.09506 • Published Sep 14, 2024 • 4