hf-papers / AGENTS.md
evalstate's picture
evalstate HF Staff
sync: promote hf_hub_community prompt v3 + add prompt/coverage harness
bba4fab verified
# AGENTS.md
## Repository Orientation
This repository has **two main purposes**:
1. **Production-facing agents/tools** for Hugging Face workflows
2. **Evaluation harnesses** (prompts, runners, scoring, reports, plots)
---
## 1) Production Surface
Use these for real user-facing behavior:
- **Hub Community agent/tooling**
- Card: `.fast-agent/tool-cards/hf_hub_community.md`
- Tool backend: `.fast-agent/tool-cards/hf_api_tool.py`
- Focus: users/orgs/followers/discussions/collections/recent activity workflows
- **Daily Papers search agent/tooling**
- Card: `.fast-agent/tool-cards/hf_paper_search.md`
- Tool backend: `.fast-agent/tool-cards/hf_papers_tool.py`
- Focus: `/api/daily_papers` retrieval + filtering
---
## 2) Evaluation Inputs
Canonical challenge/config files:
- `scripts/hf_hub_community_challenges.txt`
- `scripts/tool_routing_challenges.txt`
- `scripts/tool_routing_expected.json`
- `scripts/tool_description_variants.json`
---
## 3) Evaluation Runners / Scorers
- `scripts/score_hf_hub_community_challenges.py`
- Runs + scores the HF Hub community challenge pack
- `scripts/score_tool_routing_confusion.py`
- Scores routing/confusion quality for one model
- `scripts/run_tool_routing_batch.py`
- Batch wrapper for routing eval across multiple models
- `scripts/eval_tool_description_ab.py`
- A/B evaluation of tool description variants
- `scripts/plot_tool_description_eval.py`
- Plot/interpretation generation from summary outputs
- `scripts/run_all_evals.sh`
- Convenience orchestrator for the full evaluation flow
---
## 4) Evaluation Outputs
- Community challenge reports:
- `docs/hf_hub_community_challenge_report.md`
- `docs/hf_hub_community_challenge_report.json`
- Routing evaluation outputs:
- `docs/tool_routing_eval/`
- Tool-description A/B outputs:
- `docs/tool_description_eval/`
Top-level result index:
- `docs/RESULTS.md`
---
## 5) Key Context Docs
- `README.md` (quick start + layout)
- `docs/SPACE.md` (workspace map)
- `docs/hf_hub_community_challenge_pack.md`
- `docs/tool_description_eval_setup.md`
- `docs/tool_description_eval/tool_description_interpretation.md`
- `bench.md`
---
## Suggested First Steps for New Contributors
1. Read `README.md` and `docs/SPACE.md`
2. Run one production query for each tool card
3. Run one eval script
4. Open generated report(s) in `docs/`
5. Then edit cards/scripts with context
---
## Space Deployment / Sync (HF CLI)
This project is hosted on Hugging Face Spaces at:
- `https://huggingface.co/spaces/evalstate/hf-papers/`
When publishing card/script updates, use the `hf` CLI (not ad-hoc manual edits) to keep deployment reproducible.
Typical flow:
1. Authenticate:
- `hf auth login`
2. Work in the local repo and validate changes.
3. Push updates to the Space repo with `hf` CLI workflows (e.g., clone/upload/commit via `hf` commands) targeting:
- `spaces/evalstate/hf-papers`
Keep production card changes (`.fast-agent/tool-cards/`) and related eval/report updates in sync when publishing.