Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning Paper • 2606.25524 • Published 5 days ago • 2
RExBench: Can coding agents autonomously implement AI research extensions? Paper • 2506.22598 • Published Jun 27, 2025 • 11