Spaces:

evalstate
/

hf-papers

Running

App Files Files Community

hf-papers / docs /SPACE.md

evalstate HF Staff

sync: promote hf_hub_community prompt v3 + add prompt/coverage harness

bba4fab verified about 1 month ago

preview code

raw

history blame contribute delete

2.62 kB

	# Workspace Guide ("What lives where")

	This is the single orientation page for new contributors.

	## 1) Production surface

	Use these when you want real user-facing behavior:

	- Community agent/tooling
	- Card: `.fast-agent/tool-cards/hf_hub_community.md`
	- Backend function tool: `.fast-agent/tool-cards/hf_api_tool.py`
	- Focus: Hub users/orgs/discussions/collections/activity API workflows

	- Papers search agent/tooling
	- Card: `.fast-agent/tool-cards/hf_paper_search.md`
	- Backend function tool: `.fast-agent/tool-cards/hf_papers_tool.py`
	- Focus: `/api/daily_papers` filtering and retrieval

	---

	## 2) Eval inputs (challenge sets)

	- `scripts/hf_hub_community_challenges.txt`
	- `scripts/hf_hub_community_coverage_prompts.json`
	- `scripts/tool_routing_challenges.txt`
	- `scripts/tool_routing_expected.json`
	- `scripts/tool_description_variants.json`

	These are the canonical prompt sets/configs used for reproducible scoring.

	---

	## 3) Eval execution scripts

	- `scripts/score_hf_hub_community_challenges.py`
	- Runs + scores the community challenge pack.

	- `scripts/score_hf_hub_community_coverage.py`
	- Runs + scores endpoint-coverage prompts that avoid overlap with the core challenge pack.

	- `scripts/score_tool_routing_confusion.py`
	- Scores tool-routing quality for a single model.

	- `scripts/run_tool_routing_batch.py`
	- Runs routing eval across many models + creates aggregate summary.

	- `scripts/eval_tool_description_ab.py`
	- A/B tests tool-description variants across models.

	- `scripts/eval_hf_hub_prompt_ab.py`
	- A/B compares prompt/card variants using both challenge and coverage packs, with summary plots.

	- `scripts/plot_tool_description_eval.py`
	- Generates plots from A/B summary CSV.

	---

	## 4) Eval outputs (results)

	- Community challenge reports:
	- `docs/hf_hub_community_challenge_report.md`
	- `docs/hf_hub_community_challenge_report.json`

	- Tool routing results:
	- `docs/tool_routing_eval/`

	- Tool description A/B outputs:
	- `docs/tool_description_eval/`

	---

	## 5) Instructions / context docs

	- `docs/hf_hub_community_challenge_pack.md`
	- `docs/tool_description_eval_setup.md`
	- `docs/tool_description_eval/tool_description_interpretation.md`
	- `bench.md`

	---

	## 6) Suggested newcomer workflow

	1. Read this file + top-level `README.md`.
	2. Run one production query for each agent.
	3. Run one scoring script (community or routing).
	4. Inspect generated markdown report in `docs/`.
	5. Only then edit tool cards or script logic.


	---

	## 7) Results at a glance

	- `docs/RESULTS.md` is the index page for all generated reports and plots.