XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs Paper • 2502.19737 • Published Feb 27, 2025
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19, 2025 • 2
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations Paper • 2509.15655 • Published Sep 19, 2025 • 2
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Paper • 2510.10159 • Published Oct 11, 2025 • 3
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
Learning Representations for New Sound Classes With Continual Self-Supervised Learning Paper • 2205.07390 • Published May 15, 2022
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue Paper • 2409.04927 • Published Sep 7, 2024
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19, 2025 • 2
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19, 2025 • 2
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations Paper • 2509.15655 • Published Sep 19, 2025 • 2
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Paper • 2407.09732 • Published Jul 13, 2024 • 10
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation Paper • 2408.11849 • Published Aug 13, 2024