Add GSM8K evaluation result
#112
by
burtenshaw HF Staff - opened
Evaluation Results
This PR adds structured evaluation results using the new .eval_results/ format.
What This Enables
- Model Page: Results appear on the model page with benchmark links
- Leaderboards: Scores are aggregated into benchmark dataset leaderboards
- Verification: Support for cryptographic verification of evaluation runs
Format Details
Results are stored as YAML in .eval_results/ folder. See the Eval Results Documentation for the full specification.
Generated by community-evals
Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.
