File size: 2,622 Bytes
bba4fab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Workspace Guide ("What lives where")

This is the single orientation page for new contributors.

## 1) Production surface

Use these when you want real user-facing behavior:

- **Community agent/tooling**
  - Card: `.fast-agent/tool-cards/hf_hub_community.md`
  - Backend function tool: `.fast-agent/tool-cards/hf_api_tool.py`
  - Focus: Hub users/orgs/discussions/collections/activity API workflows

- **Papers search agent/tooling**
  - Card: `.fast-agent/tool-cards/hf_paper_search.md`
  - Backend function tool: `.fast-agent/tool-cards/hf_papers_tool.py`
  - Focus: `/api/daily_papers` filtering and retrieval

---

## 2) Eval inputs (challenge sets)

- `scripts/hf_hub_community_challenges.txt`
- `scripts/hf_hub_community_coverage_prompts.json`
- `scripts/tool_routing_challenges.txt`
- `scripts/tool_routing_expected.json`
- `scripts/tool_description_variants.json`

These are the canonical prompt sets/configs used for reproducible scoring.

---

## 3) Eval execution scripts

- `scripts/score_hf_hub_community_challenges.py`
  - Runs + scores the community challenge pack.

- `scripts/score_hf_hub_community_coverage.py`
  - Runs + scores endpoint-coverage prompts that avoid overlap with the core challenge pack.

- `scripts/score_tool_routing_confusion.py`
  - Scores tool-routing quality for a single model.

- `scripts/run_tool_routing_batch.py`
  - Runs routing eval across many models + creates aggregate summary.

- `scripts/eval_tool_description_ab.py`
  - A/B tests tool-description variants across models.

- `scripts/eval_hf_hub_prompt_ab.py`
  - A/B compares prompt/card variants using both challenge and coverage packs, with summary plots.

- `scripts/plot_tool_description_eval.py`
  - Generates plots from A/B summary CSV.

---

## 4) Eval outputs (results)

- Community challenge reports:
  - `docs/hf_hub_community_challenge_report.md`
  - `docs/hf_hub_community_challenge_report.json`

- Tool routing results:
  - `docs/tool_routing_eval/`

- Tool description A/B outputs:
  - `docs/tool_description_eval/`

---

## 5) Instructions / context docs

- `docs/hf_hub_community_challenge_pack.md`
- `docs/tool_description_eval_setup.md`
- `docs/tool_description_eval/tool_description_interpretation.md`
- `bench.md`

---

## 6) Suggested newcomer workflow

1. Read this file + top-level `README.md`.
2. Run one production query for each agent.
3. Run one scoring script (community or routing).
4. Inspect generated markdown report in `docs/`.
5. Only then edit tool cards or script logic.


---

## 7) Results at a glance

- `docs/RESULTS.md` is the index page for all generated reports and plots.