view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark Oct 29 • 4
view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org Aug 20 • 6
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29 • 92