Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published 2 days ago • 23
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation Paper • 2601.17737 • Published 4 days ago • 50
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 16 days ago • 143
SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL Paper • 2601.09136 • Published 15 days ago • 38
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation Paper • 2512.21252 • Published Dec 24, 2025 • 35
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published about 1 month ago • 45
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 234
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119
Video models are zero-shot learners and reasoners Paper • 2509.20328 • Published Sep 24, 2025 • 100 • 5
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8, 2025 • 288
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 97
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published Dec 19, 2025 • 67