AGENTS.md
Repository Orientation
This repository has two main purposes:
- Production-facing agents/tools for Hugging Face workflows
- Evaluation harnesses (prompts, runners, scoring, reports, plots)
1) Production Surface
Use these for real user-facing behavior:
Hub Community agent/tooling
- Card:
.fast-agent/tool-cards/hf_hub_community.md - Tool backend:
.fast-agent/tool-cards/hf_api_tool.py - Focus: users/orgs/followers/discussions/collections/recent activity workflows
- Card:
Daily Papers search agent/tooling
- Card:
.fast-agent/tool-cards/hf_paper_search.md - Tool backend:
.fast-agent/tool-cards/hf_papers_tool.py - Focus:
/api/daily_papersretrieval + filtering
- Card:
2) Evaluation Inputs
Canonical challenge/config files:
scripts/hf_hub_community_challenges.txtscripts/tool_routing_challenges.txtscripts/tool_routing_expected.jsonscripts/tool_description_variants.json
3) Evaluation Runners / Scorers
scripts/score_hf_hub_community_challenges.py- Runs + scores the HF Hub community challenge pack
scripts/score_tool_routing_confusion.py- Scores routing/confusion quality for one model
scripts/run_tool_routing_batch.py- Batch wrapper for routing eval across multiple models
scripts/eval_tool_description_ab.py- A/B evaluation of tool description variants
scripts/plot_tool_description_eval.py- Plot/interpretation generation from summary outputs
scripts/run_all_evals.sh- Convenience orchestrator for the full evaluation flow
4) Evaluation Outputs
Community challenge reports:
docs/hf_hub_community_challenge_report.mddocs/hf_hub_community_challenge_report.json
Routing evaluation outputs:
docs/tool_routing_eval/
Tool-description A/B outputs:
docs/tool_description_eval/
Top-level result index:
docs/RESULTS.md
5) Key Context Docs
README.md(quick start + layout)docs/SPACE.md(workspace map)docs/hf_hub_community_challenge_pack.mddocs/tool_description_eval_setup.mddocs/tool_description_eval/tool_description_interpretation.mdbench.md
Suggested First Steps for New Contributors
- Read
README.mdanddocs/SPACE.md - Run one production query for each tool card
- Run one eval script
- Open generated report(s) in
docs/ - Then edit cards/scripts with context
Space Deployment / Sync (HF CLI)
This project is hosted on Hugging Face Spaces at:
https://huggingface.co/spaces/evalstate/hf-papers/
When publishing card/script updates, use the hf CLI (not ad-hoc manual edits) to keep deployment reproducible.
Typical flow:
- Authenticate:
hf auth login
- Work in the local repo and validate changes.
- Push updates to the Space repo with
hfCLI workflows (e.g., clone/upload/commit viahfcommands) targeting:spaces/evalstate/hf-papers
Keep production card changes (.fast-agent/tool-cards/) and related eval/report updates in sync when publishing.