VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction Paper • 2602.13294 • Published Feb 9 • 13
Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization Paper • 2602.10159 • Published Feb 10 • 3
EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning Paper • 2603.12698 • Published Mar 13 • 1
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding Paper • 2603.16124 • Published Mar 17 • 3
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published Mar 17 • 98
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published Mar 29 • 31
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents Paper • 2603.20691 • Published Mar 21 • 10
MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models Paper • 2604.23321 • Published 19 days ago
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 11 days ago • 100
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors Paper • 2605.10434 • Published 3 days ago • 25
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 35
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Paper • 2412.07067 • Published Dec 10, 2024
TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting Paper • 2504.09588 • Published Apr 13, 2025
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
VisCoder2: Building Multi-Language Visualization Coding Agents Paper • 2510.23642 • Published Oct 24, 2025 • 22