Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 30 days ago • 146
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Paper • 2602.16705 • Published Feb 18 • 26
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 201
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146 • 5
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7, 2025 • 75
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7, 2025 • 75