OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration Paper • 2605.28805 • Published 4 days ago • 9
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents Paper • 2605.17873 • Published 13 days ago • 11
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 11 days ago • 204
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 19 days ago • 195
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 24 days ago • 231
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 28 days ago • 166
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Paper • 2604.19548 • Published Apr 21 • 16
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published Apr 3 • 47
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Paper • 2603.25823 • Published Mar 26 • 43
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 343