hf-papers / docs /SPACE.md
evalstate's picture
evalstate HF Staff
sync: promote hf_hub_community prompt v3 + add prompt/coverage harness
bba4fab verified
# Workspace Guide ("What lives where")
This is the single orientation page for new contributors.
## 1) Production surface
Use these when you want real user-facing behavior:
- **Community agent/tooling**
- Card: `.fast-agent/tool-cards/hf_hub_community.md`
- Backend function tool: `.fast-agent/tool-cards/hf_api_tool.py`
- Focus: Hub users/orgs/discussions/collections/activity API workflows
- **Papers search agent/tooling**
- Card: `.fast-agent/tool-cards/hf_paper_search.md`
- Backend function tool: `.fast-agent/tool-cards/hf_papers_tool.py`
- Focus: `/api/daily_papers` filtering and retrieval
---
## 2) Eval inputs (challenge sets)
- `scripts/hf_hub_community_challenges.txt`
- `scripts/hf_hub_community_coverage_prompts.json`
- `scripts/tool_routing_challenges.txt`
- `scripts/tool_routing_expected.json`
- `scripts/tool_description_variants.json`
These are the canonical prompt sets/configs used for reproducible scoring.
---
## 3) Eval execution scripts
- `scripts/score_hf_hub_community_challenges.py`
- Runs + scores the community challenge pack.
- `scripts/score_hf_hub_community_coverage.py`
- Runs + scores endpoint-coverage prompts that avoid overlap with the core challenge pack.
- `scripts/score_tool_routing_confusion.py`
- Scores tool-routing quality for a single model.
- `scripts/run_tool_routing_batch.py`
- Runs routing eval across many models + creates aggregate summary.
- `scripts/eval_tool_description_ab.py`
- A/B tests tool-description variants across models.
- `scripts/eval_hf_hub_prompt_ab.py`
- A/B compares prompt/card variants using both challenge and coverage packs, with summary plots.
- `scripts/plot_tool_description_eval.py`
- Generates plots from A/B summary CSV.
---
## 4) Eval outputs (results)
- Community challenge reports:
- `docs/hf_hub_community_challenge_report.md`
- `docs/hf_hub_community_challenge_report.json`
- Tool routing results:
- `docs/tool_routing_eval/`
- Tool description A/B outputs:
- `docs/tool_description_eval/`
---
## 5) Instructions / context docs
- `docs/hf_hub_community_challenge_pack.md`
- `docs/tool_description_eval_setup.md`
- `docs/tool_description_eval/tool_description_interpretation.md`
- `bench.md`
---
## 6) Suggested newcomer workflow
1. Read this file + top-level `README.md`.
2. Run one production query for each agent.
3. Run one scoring script (community or routing).
4. Inspect generated markdown report in `docs/`.
5. Only then edit tool cards or script logic.
---
## 7) Results at a glance
- `docs/RESULTS.md` is the index page for all generated reports and plots.