Workspace Guide ("What lives where")
This is the single orientation page for new contributors.
1) Production surface
Use these when you want real user-facing behavior:
Community agent/tooling
- Card:
.fast-agent/tool-cards/hf_hub_community.md - Backend function tool:
.fast-agent/tool-cards/hf_api_tool.py - Focus: Hub users/orgs/discussions/collections/activity API workflows
- Card:
Papers search agent/tooling
- Card:
.fast-agent/tool-cards/hf_paper_search.md - Backend function tool:
.fast-agent/tool-cards/hf_papers_tool.py - Focus:
/api/daily_papersfiltering and retrieval
- Card:
2) Eval inputs (challenge sets)
scripts/hf_hub_community_challenges.txtscripts/hf_hub_community_coverage_prompts.jsonscripts/tool_routing_challenges.txtscripts/tool_routing_expected.jsonscripts/tool_description_variants.json
These are the canonical prompt sets/configs used for reproducible scoring.
3) Eval execution scripts
scripts/score_hf_hub_community_challenges.py- Runs + scores the community challenge pack.
scripts/score_hf_hub_community_coverage.py- Runs + scores endpoint-coverage prompts that avoid overlap with the core challenge pack.
scripts/score_tool_routing_confusion.py- Scores tool-routing quality for a single model.
scripts/run_tool_routing_batch.py- Runs routing eval across many models + creates aggregate summary.
scripts/eval_tool_description_ab.py- A/B tests tool-description variants across models.
scripts/eval_hf_hub_prompt_ab.py- A/B compares prompt/card variants using both challenge and coverage packs, with summary plots.
scripts/plot_tool_description_eval.py- Generates plots from A/B summary CSV.
4) Eval outputs (results)
Community challenge reports:
docs/hf_hub_community_challenge_report.mddocs/hf_hub_community_challenge_report.json
Tool routing results:
docs/tool_routing_eval/
Tool description A/B outputs:
docs/tool_description_eval/
5) Instructions / context docs
docs/hf_hub_community_challenge_pack.mddocs/tool_description_eval_setup.mddocs/tool_description_eval/tool_description_interpretation.mdbench.md
6) Suggested newcomer workflow
- Read this file + top-level
README.md. - Run one production query for each agent.
- Run one scoring script (community or routing).
- Inspect generated markdown report in
docs/. - Only then edit tool cards or script logic.
7) Results at a glance
docs/RESULTS.mdis the index page for all generated reports and plots.