# Results Index This page is the quick index to generated evaluation outputs. ## Community challenge eval - Report (markdown): `docs/hf_hub_community_challenge_report.md` - Report (json): `docs/hf_hub_community_challenge_report.json` - Inputs: `scripts/hf_hub_community_challenges.txt` - Generator: `scripts/score_hf_hub_community_challenges.py` ## Community coverage eval - Report (markdown): `docs/hf_hub_community_coverage_report.md` - Report (json): `docs/hf_hub_community_coverage_report.json` - Inputs: `scripts/hf_hub_community_coverage_prompts.json` - Generator: `scripts/score_hf_hub_community_coverage.py` ## Prompt/card A/B eval (community) - Summary: - `docs/hf_hub_prompt_ab/prompt_ab_summary.md` - `docs/hf_hub_prompt_ab/prompt_ab_summary.json` - `docs/hf_hub_prompt_ab/prompt_ab_summary.csv` - Visuals (if matplotlib available): - `docs/hf_hub_prompt_ab/prompt_ab_composite_.png` - `docs/hf_hub_prompt_ab/prompt_ab_scatter_tokens_vs_challenge.png` - Generator: - `scripts/eval_hf_hub_prompt_ab.py` ## Tool routing eval - Batch summary: - `docs/tool_routing_eval/tool_routing_batch_summary.md` - `docs/tool_routing_eval/tool_routing_batch_summary.json` - `docs/tool_routing_eval/tool_routing_batch_summary.csv` - Per-model reports: `docs/tool_routing_eval/tool_routing_*.md` (+ `.json`) - Inputs: - `scripts/tool_routing_challenges.txt` - `scripts/tool_routing_expected.json` - Generators: - `scripts/score_tool_routing_confusion.py` - `scripts/run_tool_routing_batch.py` ## Tool description A/B eval - Summary: - `docs/tool_description_eval/tool_description_ab_summary.md` - `docs/tool_description_eval/tool_description_ab_summary.json` - `docs/tool_description_eval/tool_description_ab_summary.csv` - Detailed/pairwise: - `docs/tool_description_eval/tool_description_ab_detailed.json` - `docs/tool_description_eval/tool_description_ab_pairwise.json` - `docs/tool_description_eval/tool_description_ab_pairwise.csv` - `docs/tool_description_eval/tool_description_ab_ranking.json` - Visuals: - `docs/tool_description_eval/heat_first_call_ok.png` - `docs/tool_description_eval/heat_avg_score.png` - `docs/tool_description_eval/heat_avg_calls.png` - `docs/tool_description_eval/scatter_calls_vs_first_ok.png` - `docs/tool_description_eval/tool_description_interpretation.md` - Inputs: - `scripts/hf_hub_community_challenges.txt` - `scripts/tool_description_variants.json` - Generators: - `scripts/eval_tool_description_ab.py` - `scripts/plot_tool_description_eval.py` --- ## One-command regeneration ```bash scripts/run_all_evals.sh ``` Optional environment overrides: ```bash MODELS=gpt-oss,gpt-5-mini ROUTER_AGENT=hf_hub_community scripts/run_all_evals.sh ```