File size: 2,754 Bytes
bba4fab | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | # Results Index
This page is the quick index to generated evaluation outputs.
## Community challenge eval
- Report (markdown): `docs/hf_hub_community_challenge_report.md`
- Report (json): `docs/hf_hub_community_challenge_report.json`
- Inputs: `scripts/hf_hub_community_challenges.txt`
- Generator: `scripts/score_hf_hub_community_challenges.py`
## Community coverage eval
- Report (markdown): `docs/hf_hub_community_coverage_report.md`
- Report (json): `docs/hf_hub_community_coverage_report.json`
- Inputs: `scripts/hf_hub_community_coverage_prompts.json`
- Generator: `scripts/score_hf_hub_community_coverage.py`
## Prompt/card A/B eval (community)
- Summary:
- `docs/hf_hub_prompt_ab/prompt_ab_summary.md`
- `docs/hf_hub_prompt_ab/prompt_ab_summary.json`
- `docs/hf_hub_prompt_ab/prompt_ab_summary.csv`
- Visuals (if matplotlib available):
- `docs/hf_hub_prompt_ab/prompt_ab_composite_<model>.png`
- `docs/hf_hub_prompt_ab/prompt_ab_scatter_tokens_vs_challenge.png`
- Generator:
- `scripts/eval_hf_hub_prompt_ab.py`
## Tool routing eval
- Batch summary:
- `docs/tool_routing_eval/tool_routing_batch_summary.md`
- `docs/tool_routing_eval/tool_routing_batch_summary.json`
- `docs/tool_routing_eval/tool_routing_batch_summary.csv`
- Per-model reports: `docs/tool_routing_eval/tool_routing_*.md` (+ `.json`)
- Inputs:
- `scripts/tool_routing_challenges.txt`
- `scripts/tool_routing_expected.json`
- Generators:
- `scripts/score_tool_routing_confusion.py`
- `scripts/run_tool_routing_batch.py`
## Tool description A/B eval
- Summary:
- `docs/tool_description_eval/tool_description_ab_summary.md`
- `docs/tool_description_eval/tool_description_ab_summary.json`
- `docs/tool_description_eval/tool_description_ab_summary.csv`
- Detailed/pairwise:
- `docs/tool_description_eval/tool_description_ab_detailed.json`
- `docs/tool_description_eval/tool_description_ab_pairwise.json`
- `docs/tool_description_eval/tool_description_ab_pairwise.csv`
- `docs/tool_description_eval/tool_description_ab_ranking.json`
- Visuals:
- `docs/tool_description_eval/heat_first_call_ok.png`
- `docs/tool_description_eval/heat_avg_score.png`
- `docs/tool_description_eval/heat_avg_calls.png`
- `docs/tool_description_eval/scatter_calls_vs_first_ok.png`
- `docs/tool_description_eval/tool_description_interpretation.md`
- Inputs:
- `scripts/hf_hub_community_challenges.txt`
- `scripts/tool_description_variants.json`
- Generators:
- `scripts/eval_tool_description_ab.py`
- `scripts/plot_tool_description_eval.py`
---
## One-command regeneration
```bash
scripts/run_all_evals.sh
```
Optional environment overrides:
```bash
MODELS=gpt-oss,gpt-5-mini ROUTER_AGENT=hf_hub_community scripts/run_all_evals.sh
```
|