DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes Paper • 2605.28421 • Published 1 day ago • 29
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 3 days ago • 96
ACC: Compiling Agent Trajectories for Long-Context Training Paper • 2605.21850 • Published 7 days ago • 59
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published 14 days ago • 118
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents Paper • 2604.02947 • Published Apr 3 • 19
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published Apr 3 • 37
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published Mar 26 • 117
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Paper • 2603.19224 • Published Mar 19 • 18
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens Paper • 2603.02138 • Published Mar 2 • 151
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training Paper • 2501.08197 • Published Jan 14, 2025 • 9