MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Paper • 2510.18692 • Published Oct 21, 2025 • 41
UltraGen: High-Resolution Video Generation with Hierarchical Attention Paper • 2510.18775 • Published Oct 21, 2025 • 18
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11, 2025 • 31
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published May 29, 2025 • 23
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks Paper • 2505.11881 • Published May 17, 2025 • 4
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published Apr 14, 2025 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 153
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7, 2025 • 64
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4, 2025 • 66
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16, 2025 • 35
Learning and Leveraging World Models in Visual Representation Learning Paper • 2403.00504 • Published Mar 1, 2024 • 33
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30, 2024 • 20