BrokenMath Collection The first benchmark for evaluating LLM sycophancy in mathematical reasoning. • 3 items • Updated Oct 10, 2025
BrokenMath Collection The first benchmark for evaluating LLM sycophancy in mathematical reasoning. • 3 items • Updated Oct 10, 2025
Open Proof Corpus Collection A collection of the Open Proof Corpus dataset and the finetuned judging model. • 2 items • Updated Oct 8, 2025