Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 8 days ago • 26
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8