# AGENTS.md ## Repository Orientation This repository has **two main purposes**: 1. **Production-facing agents/tools** for Hugging Face workflows 2. **Evaluation harnesses** (prompts, runners, scoring, reports, plots) --- ## 1) Production Surface Use these for real user-facing behavior: - **Hub Community agent/tooling** - Card: `.fast-agent/tool-cards/hf_hub_community.md` - Tool backend: `.fast-agent/tool-cards/hf_api_tool.py` - Focus: users/orgs/followers/discussions/collections/recent activity workflows - **Daily Papers search agent/tooling** - Card: `.fast-agent/tool-cards/hf_paper_search.md` - Tool backend: `.fast-agent/tool-cards/hf_papers_tool.py` - Focus: `/api/daily_papers` retrieval + filtering --- ## 2) Evaluation Inputs Canonical challenge/config files: - `scripts/hf_hub_community_challenges.txt` - `scripts/tool_routing_challenges.txt` - `scripts/tool_routing_expected.json` - `scripts/tool_description_variants.json` --- ## 3) Evaluation Runners / Scorers - `scripts/score_hf_hub_community_challenges.py` - Runs + scores the HF Hub community challenge pack - `scripts/score_tool_routing_confusion.py` - Scores routing/confusion quality for one model - `scripts/run_tool_routing_batch.py` - Batch wrapper for routing eval across multiple models - `scripts/eval_tool_description_ab.py` - A/B evaluation of tool description variants - `scripts/plot_tool_description_eval.py` - Plot/interpretation generation from summary outputs - `scripts/run_all_evals.sh` - Convenience orchestrator for the full evaluation flow --- ## 4) Evaluation Outputs - Community challenge reports: - `docs/hf_hub_community_challenge_report.md` - `docs/hf_hub_community_challenge_report.json` - Routing evaluation outputs: - `docs/tool_routing_eval/` - Tool-description A/B outputs: - `docs/tool_description_eval/` Top-level result index: - `docs/RESULTS.md` --- ## 5) Key Context Docs - `README.md` (quick start + layout) - `docs/SPACE.md` (workspace map) - `docs/hf_hub_community_challenge_pack.md` - `docs/tool_description_eval_setup.md` - `docs/tool_description_eval/tool_description_interpretation.md` - `bench.md` --- ## Suggested First Steps for New Contributors 1. Read `README.md` and `docs/SPACE.md` 2. Run one production query for each tool card 3. Run one eval script 4. Open generated report(s) in `docs/` 5. Then edit cards/scripts with context --- ## Space Deployment / Sync (HF CLI) This project is hosted on Hugging Face Spaces at: - `https://huggingface.co/spaces/evalstate/hf-papers/` When publishing card/script updates, use the `hf` CLI (not ad-hoc manual edits) to keep deployment reproducible. Typical flow: 1. Authenticate: - `hf auth login` 2. Work in the local repo and validate changes. 3. Push updates to the Space repo with `hf` CLI workflows (e.g., clone/upload/commit via `hf` commands) targeting: - `spaces/evalstate/hf-papers` Keep production card changes (`.fast-agent/tool-cards/`) and related eval/report updates in sync when publishing.