Spaces:

evalstate
/

hf-papers

Running

App Files Files Community

hf-papers / AGENTS.md

evalstate HF Staff

sync: promote hf_hub_community prompt v3 + add prompt/coverage harness

bba4fab verified about 2 months ago

preview code

raw

history blame contribute delete

3.05 kB

	# AGENTS.md

	## Repository Orientation

	This repository has two main purposes:

	1. Production-facing agents/tools for Hugging Face workflows
	2. Evaluation harnesses (prompts, runners, scoring, reports, plots)

	---

	## 1) Production Surface

	Use these for real user-facing behavior:

	- Hub Community agent/tooling
	- Card: `.fast-agent/tool-cards/hf_hub_community.md`
	- Tool backend: `.fast-agent/tool-cards/hf_api_tool.py`
	- Focus: users/orgs/followers/discussions/collections/recent activity workflows

	- Daily Papers search agent/tooling
	- Card: `.fast-agent/tool-cards/hf_paper_search.md`
	- Tool backend: `.fast-agent/tool-cards/hf_papers_tool.py`
	- Focus: `/api/daily_papers` retrieval + filtering

	---

	## 2) Evaluation Inputs

	Canonical challenge/config files:

	- `scripts/hf_hub_community_challenges.txt`
	- `scripts/tool_routing_challenges.txt`
	- `scripts/tool_routing_expected.json`
	- `scripts/tool_description_variants.json`

	---

	## 3) Evaluation Runners / Scorers

	- `scripts/score_hf_hub_community_challenges.py`
	- Runs + scores the HF Hub community challenge pack

	- `scripts/score_tool_routing_confusion.py`
	- Scores routing/confusion quality for one model

	- `scripts/run_tool_routing_batch.py`
	- Batch wrapper for routing eval across multiple models

	- `scripts/eval_tool_description_ab.py`
	- A/B evaluation of tool description variants

	- `scripts/plot_tool_description_eval.py`
	- Plot/interpretation generation from summary outputs

	- `scripts/run_all_evals.sh`
	- Convenience orchestrator for the full evaluation flow

	---

	## 4) Evaluation Outputs

	- Community challenge reports:
	- `docs/hf_hub_community_challenge_report.md`
	- `docs/hf_hub_community_challenge_report.json`

	- Routing evaluation outputs:
	- `docs/tool_routing_eval/`

	- Tool-description A/B outputs:
	- `docs/tool_description_eval/`

	Top-level result index:

	- `docs/RESULTS.md`

	---

	## 5) Key Context Docs

	- `README.md` (quick start + layout)
	- `docs/SPACE.md` (workspace map)
	- `docs/hf_hub_community_challenge_pack.md`
	- `docs/tool_description_eval_setup.md`
	- `docs/tool_description_eval/tool_description_interpretation.md`
	- `bench.md`

	---

	## Suggested First Steps for New Contributors

	1. Read `README.md` and `docs/SPACE.md`
	2. Run one production query for each tool card
	3. Run one eval script
	4. Open generated report(s) in `docs/`
	5. Then edit cards/scripts with context

	---

	## Space Deployment / Sync (HF CLI)

	This project is hosted on Hugging Face Spaces at:

	- `https://huggingface.co/spaces/evalstate/hf-papers/`

	When publishing card/script updates, use the `hf` CLI (not ad-hoc manual edits) to keep deployment reproducible.

	Typical flow:

	1. Authenticate:
	- `hf auth login`
	2. Work in the local repo and validate changes.
	3. Push updates to the Space repo with `hf` CLI workflows (e.g., clone/upload/commit via `hf` commands) targeting:
	- `spaces/evalstate/hf-papers`

	Keep production card changes (`.fast-agent/tool-cards/`) and related eval/report updates in sync when publishing.