AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 1 day ago • 64
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation Paper • 2605.12480 • Published 2 days ago • 4
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 2 days ago • 140
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation Paper • 2605.08043 • Published 6 days ago • 9
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 6 days ago • 90
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents Paper • 2604.17308 • Published 25 days ago • 22
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 29 days ago • 159
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published 29 days ago • 63
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published Apr 7 • 42
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 55
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published Mar 25 • 28
mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT Paper • 2603.21606 • Published Mar 23 • 39
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation Paper • 2602.01756 • Published Feb 2 • 23
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph Paper • 2602.12735 • Published Feb 13 • 8
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models Paper • 2602.10224 • Published Feb 10 • 19
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published Feb 2 • 118