FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies Paper • 2605.27284 • Published May 26 • 9
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 29 days ago • 59
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 30 days ago • 146
FineVLA: Fine-Grained Instruction Alignment For VLA Collection This is the collection of FineVLA, including the RoboFine-Bench RoboFine-VLM and FineVLA-policyLA • 3 items • Updated 22 days ago • 1
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Paper • 2504.20630 • Published Apr 29, 2025 • 9