Add GSM8K evaluation result

#112
by burtenshaw HF Staff - opened

Evaluation Results

This PR adds structured evaluation results using the new .eval_results/ format.

What This Enables

  • Model Page: Results appear on the model page with benchmark links
  • Leaderboards: Scores are aggregated into benchmark dataset leaderboards
  • Verification: Support for cryptographic verification of evaluation runs

Model Evaluation Results

Format Details

Results are stored as YAML in .eval_results/ folder. See the Eval Results Documentation for the full specification.


Generated by community-evals

Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment