StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 2 days ago • 13
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 13 days ago • 51
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 53