X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding Paper • 2606.02482 • Published Jun 1 • 36
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling Paper • 2604.23586 • Published Apr 26 • 7
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published Oct 10, 2025 • 18
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6, 2025 • 72
view article Article From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages Steveeeeeeen • Feb 11, 2025 • 34
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published Feb 6, 2025 • 28
Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 12 items • Updated 7 days ago • 22