| # Results Index | |
| This page is the quick index to generated evaluation outputs. | |
| ## Community challenge eval | |
| - Report (markdown): `docs/hf_hub_community_challenge_report.md` | |
| - Report (json): `docs/hf_hub_community_challenge_report.json` | |
| - Inputs: `scripts/hf_hub_community_challenges.txt` | |
| - Generator: `scripts/score_hf_hub_community_challenges.py` | |
| ## Community coverage eval | |
| - Report (markdown): `docs/hf_hub_community_coverage_report.md` | |
| - Report (json): `docs/hf_hub_community_coverage_report.json` | |
| - Inputs: `scripts/hf_hub_community_coverage_prompts.json` | |
| - Generator: `scripts/score_hf_hub_community_coverage.py` | |
| ## Prompt/card A/B eval (community) | |
| - Summary: | |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.md` | |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.json` | |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.csv` | |
| - Visuals (if matplotlib available): | |
| - `docs/hf_hub_prompt_ab/prompt_ab_composite_<model>.png` | |
| - `docs/hf_hub_prompt_ab/prompt_ab_scatter_tokens_vs_challenge.png` | |
| - Generator: | |
| - `scripts/eval_hf_hub_prompt_ab.py` | |
| ## Tool routing eval | |
| - Batch summary: | |
| - `docs/tool_routing_eval/tool_routing_batch_summary.md` | |
| - `docs/tool_routing_eval/tool_routing_batch_summary.json` | |
| - `docs/tool_routing_eval/tool_routing_batch_summary.csv` | |
| - Per-model reports: `docs/tool_routing_eval/tool_routing_*.md` (+ `.json`) | |
| - Inputs: | |
| - `scripts/tool_routing_challenges.txt` | |
| - `scripts/tool_routing_expected.json` | |
| - Generators: | |
| - `scripts/score_tool_routing_confusion.py` | |
| - `scripts/run_tool_routing_batch.py` | |
| ## Tool description A/B eval | |
| - Summary: | |
| - `docs/tool_description_eval/tool_description_ab_summary.md` | |
| - `docs/tool_description_eval/tool_description_ab_summary.json` | |
| - `docs/tool_description_eval/tool_description_ab_summary.csv` | |
| - Detailed/pairwise: | |
| - `docs/tool_description_eval/tool_description_ab_detailed.json` | |
| - `docs/tool_description_eval/tool_description_ab_pairwise.json` | |
| - `docs/tool_description_eval/tool_description_ab_pairwise.csv` | |
| - `docs/tool_description_eval/tool_description_ab_ranking.json` | |
| - Visuals: | |
| - `docs/tool_description_eval/heat_first_call_ok.png` | |
| - `docs/tool_description_eval/heat_avg_score.png` | |
| - `docs/tool_description_eval/heat_avg_calls.png` | |
| - `docs/tool_description_eval/scatter_calls_vs_first_ok.png` | |
| - `docs/tool_description_eval/tool_description_interpretation.md` | |
| - Inputs: | |
| - `scripts/hf_hub_community_challenges.txt` | |
| - `scripts/tool_description_variants.json` | |
| - Generators: | |
| - `scripts/eval_tool_description_ab.py` | |
| - `scripts/plot_tool_description_eval.py` | |
| --- | |
| ## One-command regeneration | |
| ```bash | |
| scripts/run_all_evals.sh | |
| ``` | |
| Optional environment overrides: | |
| ```bash | |
| MODELS=gpt-oss,gpt-5-mini ROUTER_AGENT=hf_hub_community scripts/run_all_evals.sh | |
| ``` | |