DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving Paper • 2601.01528 • Published 22 days ago • 19
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 20 days ago • 134
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry Paper • 2512.19629 • Published Dec 22, 2025 • 26
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 102
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 Aug 21, 2024 • 42
Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge Paper • 2512.10071 • Published Dec 10, 2025 • 18
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation Paper • 2512.10949 • Published Dec 11, 2025 • 46
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 71
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation Paper • 2512.04678 • Published Dec 4, 2025 • 41
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Paper • 2512.02834 • Published Dec 2, 2025 • 41
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Paper • 2505.24625 • Published May 30, 2025 • 9
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28, 2025 • 77
EO-Robotics Collection EmbodiedOneVision is a unified framework for multimodal embodied reasoning and robot control, featuring interleaved vision-text-action pretraining. • 8 items • Updated Dec 7, 2025 • 8
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7, 2025 • 73
Hume: Introducing System-2 Thinking in Visual-Language-Action Model Paper • 2505.21432 • Published May 27, 2025 • 4