Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29, 2024 • 37
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Paper • 2506.07464 • Published Jun 9, 2025 • 14
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 50
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23, 2025 • 56
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning Paper • 2510.23473 • Published Oct 27, 2025 • 85
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 77
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20, 2025 • 55
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision Paper • 2512.01342 • Published Dec 1, 2025 • 18
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation Paper • 2512.04678 • Published Dec 4, 2025 • 41
Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published Dec 11, 2025 • 19
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published Dec 15, 2025 • 17
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling Paper • 2512.15702 • Published Dec 17, 2025 • 15
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published Dec 23, 2025 • 54
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations Paper • 2512.21004 • Published Dec 24, 2025 • 13
Inference-time Physics Alignment of Video Generative Models with Latent World Models Paper • 2601.10553 • Published 13 days ago • 12
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 7 days ago • 42