The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 6 days ago • 42
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening Paper • 2605.19597 • Published May 19 • 21
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening Paper • 2605.19597 • Published May 19 • 21