hf-papers / docs /SPACE.md
evalstate's picture
evalstate HF Staff
sync: promote hf_hub_community prompt v3 + add prompt/coverage harness
bba4fab verified

Workspace Guide ("What lives where")

This is the single orientation page for new contributors.

1) Production surface

Use these when you want real user-facing behavior:

  • Community agent/tooling

    • Card: .fast-agent/tool-cards/hf_hub_community.md
    • Backend function tool: .fast-agent/tool-cards/hf_api_tool.py
    • Focus: Hub users/orgs/discussions/collections/activity API workflows
  • Papers search agent/tooling

    • Card: .fast-agent/tool-cards/hf_paper_search.md
    • Backend function tool: .fast-agent/tool-cards/hf_papers_tool.py
    • Focus: /api/daily_papers filtering and retrieval

2) Eval inputs (challenge sets)

  • scripts/hf_hub_community_challenges.txt
  • scripts/hf_hub_community_coverage_prompts.json
  • scripts/tool_routing_challenges.txt
  • scripts/tool_routing_expected.json
  • scripts/tool_description_variants.json

These are the canonical prompt sets/configs used for reproducible scoring.


3) Eval execution scripts

  • scripts/score_hf_hub_community_challenges.py

    • Runs + scores the community challenge pack.
  • scripts/score_hf_hub_community_coverage.py

    • Runs + scores endpoint-coverage prompts that avoid overlap with the core challenge pack.
  • scripts/score_tool_routing_confusion.py

    • Scores tool-routing quality for a single model.
  • scripts/run_tool_routing_batch.py

    • Runs routing eval across many models + creates aggregate summary.
  • scripts/eval_tool_description_ab.py

    • A/B tests tool-description variants across models.
  • scripts/eval_hf_hub_prompt_ab.py

    • A/B compares prompt/card variants using both challenge and coverage packs, with summary plots.
  • scripts/plot_tool_description_eval.py

    • Generates plots from A/B summary CSV.

4) Eval outputs (results)

  • Community challenge reports:

    • docs/hf_hub_community_challenge_report.md
    • docs/hf_hub_community_challenge_report.json
  • Tool routing results:

    • docs/tool_routing_eval/
  • Tool description A/B outputs:

    • docs/tool_description_eval/

5) Instructions / context docs

  • docs/hf_hub_community_challenge_pack.md
  • docs/tool_description_eval_setup.md
  • docs/tool_description_eval/tool_description_interpretation.md
  • bench.md

6) Suggested newcomer workflow

  1. Read this file + top-level README.md.
  2. Run one production query for each agent.
  3. Run one scoring script (community or routing).
  4. Inspect generated markdown report in docs/.
  5. Only then edit tool cards or script logic.

7) Results at a glance

  • docs/RESULTS.md is the index page for all generated reports and plots.