EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 4 days ago • 75
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 10 days ago • 204
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection Paper • 2605.30288 • Published 28 days ago • 23
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published Apr 21 • 23