| # AGENTS.md |
|
|
| ## Repository Orientation |
|
|
| This repository has **two main purposes**: |
|
|
| 1. **Production-facing agents/tools** for Hugging Face workflows |
| 2. **Evaluation harnesses** (prompts, runners, scoring, reports, plots) |
|
|
| --- |
|
|
| ## 1) Production Surface |
|
|
| Use these for real user-facing behavior: |
|
|
| - **Hub Community agent/tooling** |
| - Card: `.fast-agent/tool-cards/hf_hub_community.md` |
| - Tool backend: `.fast-agent/tool-cards/hf_api_tool.py` |
| - Focus: users/orgs/followers/discussions/collections/recent activity workflows |
|
|
| - **Daily Papers search agent/tooling** |
| - Card: `.fast-agent/tool-cards/hf_paper_search.md` |
| - Tool backend: `.fast-agent/tool-cards/hf_papers_tool.py` |
| - Focus: `/api/daily_papers` retrieval + filtering |
|
|
| --- |
|
|
| ## 2) Evaluation Inputs |
|
|
| Canonical challenge/config files: |
|
|
| - `scripts/hf_hub_community_challenges.txt` |
| - `scripts/tool_routing_challenges.txt` |
| - `scripts/tool_routing_expected.json` |
| - `scripts/tool_description_variants.json` |
|
|
| --- |
|
|
| ## 3) Evaluation Runners / Scorers |
|
|
| - `scripts/score_hf_hub_community_challenges.py` |
| - Runs + scores the HF Hub community challenge pack |
|
|
| - `scripts/score_tool_routing_confusion.py` |
| - Scores routing/confusion quality for one model |
|
|
| - `scripts/run_tool_routing_batch.py` |
| - Batch wrapper for routing eval across multiple models |
|
|
| - `scripts/eval_tool_description_ab.py` |
| - A/B evaluation of tool description variants |
|
|
| - `scripts/plot_tool_description_eval.py` |
| - Plot/interpretation generation from summary outputs |
|
|
| - `scripts/run_all_evals.sh` |
| - Convenience orchestrator for the full evaluation flow |
|
|
| --- |
|
|
| ## 4) Evaluation Outputs |
|
|
| - Community challenge reports: |
| - `docs/hf_hub_community_challenge_report.md` |
| - `docs/hf_hub_community_challenge_report.json` |
|
|
| - Routing evaluation outputs: |
| - `docs/tool_routing_eval/` |
|
|
| - Tool-description A/B outputs: |
| - `docs/tool_description_eval/` |
|
|
| Top-level result index: |
|
|
| - `docs/RESULTS.md` |
|
|
| --- |
|
|
| ## 5) Key Context Docs |
|
|
| - `README.md` (quick start + layout) |
| - `docs/SPACE.md` (workspace map) |
| - `docs/hf_hub_community_challenge_pack.md` |
| - `docs/tool_description_eval_setup.md` |
| - `docs/tool_description_eval/tool_description_interpretation.md` |
| - `bench.md` |
|
|
| --- |
|
|
| ## Suggested First Steps for New Contributors |
|
|
| 1. Read `README.md` and `docs/SPACE.md` |
| 2. Run one production query for each tool card |
| 3. Run one eval script |
| 4. Open generated report(s) in `docs/` |
| 5. Then edit cards/scripts with context |
|
|
| --- |
|
|
| ## Space Deployment / Sync (HF CLI) |
|
|
| This project is hosted on Hugging Face Spaces at: |
|
|
| - `https://huggingface.co/spaces/evalstate/hf-papers/` |
|
|
| When publishing card/script updates, use the `hf` CLI (not ad-hoc manual edits) to keep deployment reproducible. |
|
|
| Typical flow: |
|
|
| 1. Authenticate: |
| - `hf auth login` |
| 2. Work in the local repo and validate changes. |
| 3. Push updates to the Space repo with `hf` CLI workflows (e.g., clone/upload/commit via `hf` commands) targeting: |
| - `spaces/evalstate/hf-papers` |
|
|
| Keep production card changes (`.fast-agent/tool-cards/`) and related eval/report updates in sync when publishing. |
|
|