# AGENTS.md

## Repository Orientation

This repository has **two main purposes**:

1. **Production-facing agents/tools** for Hugging Face workflows
2. **Evaluation harnesses** (prompts, runners, scoring, reports, plots)

---

## 1) Production Surface

Use these for real user-facing behavior:

- **Hub Community agent/tooling**
  - Card: `.fast-agent/tool-cards/hf_hub_community.md`
  - Tool backend: `.fast-agent/tool-cards/hf_api_tool.py`
  - Focus: users/orgs/followers/discussions/collections/recent activity workflows

- **Daily Papers search agent/tooling**
  - Card: `.fast-agent/tool-cards/hf_paper_search.md`
  - Tool backend: `.fast-agent/tool-cards/hf_papers_tool.py`
  - Focus: `/api/daily_papers` retrieval + filtering

---

## 2) Evaluation Inputs

Canonical challenge/config files:

- `scripts/hf_hub_community_challenges.txt`
- `scripts/tool_routing_challenges.txt`
- `scripts/tool_routing_expected.json`
- `scripts/tool_description_variants.json`

---

## 3) Evaluation Runners / Scorers

- `scripts/score_hf_hub_community_challenges.py`
  - Runs + scores the HF Hub community challenge pack

- `scripts/score_tool_routing_confusion.py`
  - Scores routing/confusion quality for one model

- `scripts/run_tool_routing_batch.py`
  - Batch wrapper for routing eval across multiple models

- `scripts/eval_tool_description_ab.py`
  - A/B evaluation of tool description variants

- `scripts/plot_tool_description_eval.py`
  - Plot/interpretation generation from summary outputs

- `scripts/run_all_evals.sh`
  - Convenience orchestrator for the full evaluation flow

---

## 4) Evaluation Outputs

- Community challenge reports:
  - `docs/hf_hub_community_challenge_report.md`
  - `docs/hf_hub_community_challenge_report.json`

- Routing evaluation outputs:
  - `docs/tool_routing_eval/`

- Tool-description A/B outputs:
  - `docs/tool_description_eval/`

Top-level result index:

- `docs/RESULTS.md`

---

## 5) Key Context Docs

- `README.md` (quick start + layout)
- `docs/SPACE.md` (workspace map)
- `docs/hf_hub_community_challenge_pack.md`
- `docs/tool_description_eval_setup.md`
- `docs/tool_description_eval/tool_description_interpretation.md`
- `bench.md`

---

## Suggested First Steps for New Contributors

1. Read `README.md` and `docs/SPACE.md`
2. Run one production query for each tool card
3. Run one eval script
4. Open generated report(s) in `docs/`
5. Then edit cards/scripts with context

---

## Space Deployment / Sync (HF CLI)

This project is hosted on Hugging Face Spaces at:

- `https://huggingface.co/spaces/evalstate/hf-papers/`

When publishing card/script updates, use the `hf` CLI (not ad-hoc manual edits) to keep deployment reproducible.

Typical flow:

1. Authenticate:
   - `hf auth login`
2. Work in the local repo and validate changes.
3. Push updates to the Space repo with `hf` CLI workflows (e.g., clone/upload/commit via `hf` commands) targeting:
   - `spaces/evalstate/hf-papers`

Keep production card changes (`.fast-agent/tool-cards/`) and related eval/report updates in sync when publishing.