Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published 7 days ago • 27
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published 5 days ago • 31
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 5 days ago • 50
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training Paper • 2604.14932 • Published Apr 16 • 11
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models Paper • 2604.14920 • Published Apr 16 • 3
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training Paper • 2604.14932 • Published Apr 16 • 11
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models Paper • 2604.14920 • Published Apr 16 • 3
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models Paper • 2502.14727 • Published Feb 20, 2025 • 2
RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence Paper • 2512.02622 • Published Dec 2, 2025 • 10
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators Paper • 2505.09558 • Published May 14, 2025 • 10
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50