Geo-Align: Video Generation Alignment via Metric Geometry Reward Paper • 2605.23903 • Published May 22 • 10
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published May 14 • 91
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published May 14 • 39
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published May 14 • 39
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 92
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published Jan 5 • 30
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation Paper • 2509.25077 • Published Sep 29, 2025 • 15
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation Paper • 2509.15185 • Published Sep 18, 2025 • 29
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network Paper • 2002.10200 • Published Feb 24, 2020
DeepVerse: 4D Autoregressive Video Generation as a World Model Paper • 2506.01103 • Published Jun 1, 2025 • 1
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool Paper • 2509.05296 • Published Sep 5, 2025 • 8
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool Paper • 2509.05296 • Published Sep 5, 2025 • 8