FutureSim: Replaying World Events to Evaluate Adaptive Agents Paper • 2605.15188 • Published 6 days ago • 6
Training AI Co-Scientists Using Rubric Rewards Paper • 2512.23707 • Published Dec 29, 2025 • 21
Scaling Open-Ended Reasoning to Predict the Future Paper • 2512.25070 • Published Dec 31, 2025 • 20
mini-coder Collection Small models for agentic SWE research: https://ricardodominguez.github.io/blogs/minicoder.html • 3 items • Updated Mar 2 • 2
Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published Jul 3, 2025 • 9