PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation Paper • 2606.28128 • Published 4 days ago • 30
HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining Paper • 2606.20521 • Published 12 days ago • 14
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving Paper • 2605.23163 • Published May 25 • 17
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published May 18 • 15
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published May 7 • 55
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published Jan 23 • 34
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Paper • 2503.10391 • Published Mar 13, 2025 • 12
Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models Paper • 2601.07287 • Published Jan 12 • 6
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 53