hf-papers / AGENTS.md
evalstate's picture
evalstate HF Staff
sync: promote hf_hub_community prompt v3 + add prompt/coverage harness
bba4fab verified

AGENTS.md

Repository Orientation

This repository has two main purposes:

  1. Production-facing agents/tools for Hugging Face workflows
  2. Evaluation harnesses (prompts, runners, scoring, reports, plots)

1) Production Surface

Use these for real user-facing behavior:

  • Hub Community agent/tooling

    • Card: .fast-agent/tool-cards/hf_hub_community.md
    • Tool backend: .fast-agent/tool-cards/hf_api_tool.py
    • Focus: users/orgs/followers/discussions/collections/recent activity workflows
  • Daily Papers search agent/tooling

    • Card: .fast-agent/tool-cards/hf_paper_search.md
    • Tool backend: .fast-agent/tool-cards/hf_papers_tool.py
    • Focus: /api/daily_papers retrieval + filtering

2) Evaluation Inputs

Canonical challenge/config files:

  • scripts/hf_hub_community_challenges.txt
  • scripts/tool_routing_challenges.txt
  • scripts/tool_routing_expected.json
  • scripts/tool_description_variants.json

3) Evaluation Runners / Scorers

  • scripts/score_hf_hub_community_challenges.py

    • Runs + scores the HF Hub community challenge pack
  • scripts/score_tool_routing_confusion.py

    • Scores routing/confusion quality for one model
  • scripts/run_tool_routing_batch.py

    • Batch wrapper for routing eval across multiple models
  • scripts/eval_tool_description_ab.py

    • A/B evaluation of tool description variants
  • scripts/plot_tool_description_eval.py

    • Plot/interpretation generation from summary outputs
  • scripts/run_all_evals.sh

    • Convenience orchestrator for the full evaluation flow

4) Evaluation Outputs

  • Community challenge reports:

    • docs/hf_hub_community_challenge_report.md
    • docs/hf_hub_community_challenge_report.json
  • Routing evaluation outputs:

    • docs/tool_routing_eval/
  • Tool-description A/B outputs:

    • docs/tool_description_eval/

Top-level result index:

  • docs/RESULTS.md

5) Key Context Docs

  • README.md (quick start + layout)
  • docs/SPACE.md (workspace map)
  • docs/hf_hub_community_challenge_pack.md
  • docs/tool_description_eval_setup.md
  • docs/tool_description_eval/tool_description_interpretation.md
  • bench.md

Suggested First Steps for New Contributors

  1. Read README.md and docs/SPACE.md
  2. Run one production query for each tool card
  3. Run one eval script
  4. Open generated report(s) in docs/
  5. Then edit cards/scripts with context

Space Deployment / Sync (HF CLI)

This project is hosted on Hugging Face Spaces at:

  • https://huggingface.co/spaces/evalstate/hf-papers/

When publishing card/script updates, use the hf CLI (not ad-hoc manual edits) to keep deployment reproducible.

Typical flow:

  1. Authenticate:
    • hf auth login
  2. Work in the local repo and validate changes.
  3. Push updates to the Space repo with hf CLI workflows (e.g., clone/upload/commit via hf commands) targeting:
    • spaces/evalstate/hf-papers

Keep production card changes (.fast-agent/tool-cards/) and related eval/report updates in sync when publishing.