Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models Paper • 2603.14636 • Published 13 days ago • 1
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models Paper • 2603.09714 • Published 18 days ago
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models Paper • 2603.14636 • Published 13 days ago • 1
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation Paper • 2601.06329 • Published Jan 9 • 2
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models Paper • 2510.16917 • Published Oct 19, 2025 • 20
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations Paper • 2510.16893 • Published Oct 19, 2025 • 18
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting Paper • 2309.15649 • Published Sep 27, 2023 • 1
Conditional Modeling Based Automatic Video Summarization Paper • 2311.12159 • Published Nov 20, 2023 • 1
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue Paper • 2312.15316 • Published Dec 23, 2023
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Paper • 2312.14378 • Published Dec 22, 2023
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Paper • 2402.06894 • Published Feb 10, 2024 • 1
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data Paper • 2409.20007 • Published Sep 30, 2024 • 1
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
Towards Neural Scaling Laws for Time Series Foundation Models Paper • 2410.12360 • Published Oct 16, 2024
Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation Paper • 2502.20795 • Published Feb 28, 2025
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition Paper • 2409.09785 • Published Sep 15, 2024
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Paper • 2507.02768 • Published Jul 3, 2025 • 19