Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

hf-hub-query / _monty_codegen_shared.md

evalstate HF Staff

Deploy hf-hub-query with runtime capabilities helper and budget prompt fix

c830f69 verified 3 days ago

preview code

raw

history blame contribute delete

23 kB

	## Runtime rules for generated code
	- No imports.
	- Helper functions are already in scope.
	- All helper/API calls are async: always use `await`.
	- `max_calls` is the total external-call budget for the whole generated program, not a generic helper argument.
	- The outer wrapper is an exact contract. Use this exact skeleton and only change the body:
	```py
	async def solve(query, max_calls):
	...

	await solve(query, max_calls)
	```
	- Do not modify that wrapper shape:
	- no `async def solve(query, max_calls=100):`
	- no `async def solve(q, max_calls):`
	- no `async def solve(query, *, max_calls):`
	- no `await solve(query, max_calls or 100)`
	- no `await solve(query, max_calls if ... else ...)`
	- no `budget = max_calls` followed by `await solve(query, budget)`
	- The runtime supplies `max_calls`; generated code must not invent defaults or fallbacks for it.
	- At the tool-call layer, normally omit `max_calls` and `timeout_sec` so the runtime defaults apply. Do not invent small explicit tool-call budgets like `10` or `20` for ordinary requests.
	- Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
	- `call_api` must receive a raw path starting with `/api/...`; never call helper names through `call_api`.
	- Raw `call_api(...)` endpoints must match the runtime allowlist exactly. Do not invent hyphen/underscore variants or guessed path shapes.
	- `call_api(...)` returns `{ok, status, url, data, error}`. Always check `resp["ok"]` before reading `resp["data"]`. Do not read `resp["items"]` or `resp["meta"]` directly from `call_api(...)`.
	- `call_api(...)` only accepts `endpoint`, `params`, `method`, and `json_body`. Do not guess extra kwargs.
	- Use `call_api(...)` only for endpoint families that do not already have a helper, such as `/api/daily_papers` or tag metadata endpoints.
	- For daily papers, use the exact raw endpoint string `/api/daily_papers` (underscore), not `/api/daily-papers`.
	- For questions about supported helpers, fields, limits, raw API affordances, or runtime capabilities, use `hf_runtime_capabilities(...)` instead of hand-authoring a static answer from memory.
	- Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
	- Prefer canonical snake_case keys in generated code and in JSON output.
	- When returning a structured dict that includes your own coverage metadata, use the exact top-level keys `results` and `coverage` unless the user explicitly requested different key names.
	- Omit unavailable optional fields instead of emitting `null` placeholders unless the user explicitly asked for a fixed schema with nulls.
	- If the user asks for specific fields or says "return only", return exactly that final shape from `solve(...)`.
	- For current-user prompts (`my`, `me`), use helpers with `username=None` first. Only ask for identity if that fails.
	- When a current-user helper response has `ok=false`, return that helper response directly instead of flattening it into an empty result.

	## Common helper signature traps
	These are high-priority rules. Do not guess helper arguments.

	- `hf_repo_search(...)` uses `limit`, not `return_limit`, and does not accept `count_only`.
	- `hf_trending(...)` uses `limit`, not `return_limit`.
	- `hf_repo_discussions(...)` uses `limit`, not `return_limit`.
	- `hf_user_graph(...)`, `hf_user_likes(...)`, `hf_org_members(...)`, `hf_recent_activity(...)`, and `hf_collection_items(...)` use `return_limit`.
	- For "how many models/datasets/spaces does org/user X have?" prefer `hf_org_overview(...)` or `hf_user_summary(...)["item"]["overview"]` instead of trying to count with `hf_repo_search(...)`.
	- Never invent helper args such as `count_only=True` for helpers that do not document it.

	## Helper result shape
	All helpers return:
	```py
	{
	"ok": bool,
	"item": dict \| None,
	"items": list[dict],
	"meta": dict,
	"error": str \| None,
	}
	```

	Rules:
	- `items` is the canonical list field.
	- `item` is only a singleton convenience.
	- `meta` contains helper-owned execution, coverage, and limit information.
	- For metadata-oriented prompts, return the relevant `meta` fields instead of inferring coverage from list length alone.
	- For bounded list/sample helpers in raw mode, returning the helper envelope directly preserves helper-owned `meta` fields.

	## Routing guide
	### Runtime self-description
	- Supported fields / helper signatures / limits / raw API affordances → `hf_runtime_capabilities(...)`

	### Repo questions
	- Exact `owner/name` details → `hf_repo_details(repo_type="auto", ...)`
	- Search/discovery/list/top repos → `hf_repo_search(...)`
	- True trending requests → `hf_trending(...)`
	- Repo discussions → `hf_repo_discussions(...)`
	- Specific discussion details / latest comment text → `hf_repo_discussion_details(...)`
	- Users who liked a specific repo → `hf_repo_likers(...)`

	### User questions
	- Profile / overview / "tell me about user X" → `hf_user_summary(...)`
	- Followers / following / graph samples → `hf_user_graph(...)`
	- Repos a user liked → `hf_user_likes(...)`
	- Recent actions / activity feed → `hf_recent_activity(feed_type="user", entity=...)`

	### Organization questions
	- Organization details and counts → `hf_org_overview(...)`
	- Organization members → `hf_org_members(...)`
	- Organization repos → `hf_repo_search(author="<org>", repo_types=[...])`
	- Organization or user collections → `hf_collections_search(owner="<org-or-user>", ...)`
	- Repos inside a known collection → `hf_collection_items(collection_id=...)`

	### Direction reminders
	- `hf_user_likes(...)` = user → repos
	- `hf_repo_likers(...)` = repo → users
	- `hf_user_graph(...)` = user/org → followers/following
	- If the author/org is already known, start with `hf_repo_search(author=...)` instead of semantic search.
	- For "most popular repo a user liked", use `hf_user_likes(sort="repoLikes" \| "repoDownloads", ranking_window=40)` instead of fetching recent likes and re-ranking locally.

	## Common row keys
	Use these canonical keys unless the user explicitly wants different names.

	- Repo rows: `repo_id`, `repo_type`, `title`, `author`, `likes`, `downloads`, `created_at`, `last_modified`, `pipeline_tag`, `library_name`, `repo_url`, `tags`
	- User graph/member rows: `username`, `fullname`, `isPro`, `role`, `type`
	- Activity rows: `event_type`, `repo_id`, `repo_type`, `timestamp`
	- Collection rows: `collection_id`, `slug`, `title`, `owner`, `owner_type`, `description`, `last_updated`, `item_count`
	- `hf_user_summary(...)["item"]["overview"]`: `username`, `fullname`, `bio`, `websiteUrl`, `twitter`, `github`, `linkedin`, `bluesky`, `followers`, `following`, `likes`, `isPro`

	Common aliases in `fields=[...]` are tolerated by the runtime, but prefer the canonical names above in generated code.

	## Common repo fields
	- `repo_id`
	- `repo_type`
	- `title`
	- `author`
	- `likes`
	- `downloads`
	- `created_at`
	- `last_modified`
	- `pipeline_tag`
	- `repo_url`
	- model: `library_name`
	- dataset: `description`, `paperswithcode_id`
	- space: `sdk`, `models`, `datasets`, `subdomain`

	Common aliases tolerated in `fields=[...]`:
	- `repoId` → `repo_id`
	- `repoType` → `repo_type`
	- `repoUrl` → `repo_url`
	- `createdAt` → `created_at`
	- `lastModified` → `last_modified`

	## Common collection fields
	- `collection_id`
	- `slug`
	- `title`
	- `owner`
	- `owner_type`
	- `description`
	- `last_updated`
	- `item_count`

	Common aliases tolerated in `fields=[...]`:
	- `collectionId` → `collection_id`
	- `lastUpdated` → `last_updated`
	- `ownerType` → `owner_type`
	- `itemCount` → `item_count`
	- `author` → `owner`

	## High-signal usage notes
	- `hf_repo_search(...)` defaults to models if no repo type is specified. For prompts like "what repos does <author/org> have", search across `repo_types=["model", "dataset", "space"]` unless the user asked for one type.
	- `hf_trending(...)` returns the Hub's ordered trending list. Use `trending_rank` / ordering, not a fabricated numeric trending score.
	- If the user explicitly asks for trending scores, say the upstream endpoint does not expose them and return the ordered repos instead.
	- `hf_user_summary(...)` is the fastest way to answer common profile prompts. Read profile/social fields from `summary["item"]["overview"]`.
	- For "how many models/datasets/spaces does user/org X have?" prompts, prefer the overview helpers (`hf_user_summary(...)["item"]["overview"]` or `hf_org_overview(...)`) over `hf_repo_search(..., limit=1)` or invented `count_only` args.
	- Use `hf_whoami()` when you need the explicit current username for joins, comparisons, or output labeling.
	- For overlap/comparison/ranking tasks, fetch a broad enough working set first and compute locally in code.
	- Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
	- For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned `meta` fields explicitly.
	- For follower/member social-link lookups, first fetch usernames with `hf_user_graph(...)` or `hf_org_members(...)`, then fetch profile/social data with `hf_user_summary(username=...)`.
	- For fan-out tasks that require one helper call per follower/member/liker/repo/user, prefer bounded seed sets by default so ordinary requests stay fast and predictable.
	- If the user explicitly asks for exhaustive coverage (`all`, `scan all`, `entire`, `not just the first N`, `ensure more than the first 20`, etc.), do not silently cap the seed at a small sample such as 20 or 50.
	- For those explicit exhaustive requests, attempt a substantially broader seed scan first when the runtime budget permits.
	- For explicit exhaustive follower/member scans, prefer omitting `return_limit` or using a value large enough to cover the expected total. Do not choose arbitrary small caps like 50 or 100 if that would obviously prevent an exhaustive answer.
	- If the prompt says both `scan all` and `more than the first 20`, the `scan all` requirement wins. Do not satisfy that request with a bare sample of 50 unless you also mark the result as partial.
	- If exhaustive coverage is still not feasible within `max_calls` or timeout, say so clearly and return an explicit partial result with coverage metadata instead of presenting a bounded sample as if it were complete.
	- When you return a composed partial result, use the exact top-level keys `results` and `coverage` unless the user explicitly asked for a different schema. Do not rename `results` to `items`, `rows`, `liked_models`, or similar.
	- Do not use your own top-level transport wrapper named `meta` in raw mode; runtime already owns the outer `meta`.
	- Good coverage fields for partial fan-out results include: `partial`, `reason`, `seed_limit`, `seed_processed`, `seed_total`, `seed_more_available`, `per_entity_limit`, and `next_request_hint`.
	- If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with `Max API calls exceeded`.
	- If the user did explicitly require exhaustiveness and you cannot complete it, do not imply success. Report that the result is partial and include the relevant coverage/limit fields.
	- For explicit exhaustive follower/member prompts, if `meta.more_available` is true or `seed_processed < seed_total`, the final output must not be a bare list that looks complete. Include explicit partial/coverage information.
	- Use `hf_recent_activity(...)` for activity feeds instead of raw `call_api('/api/recent-activity', ...)`.
	- Use `hf_repo_search(author=..., repo_type="space", ...)` for Spaces by author; there is no separate spaces-by-author helper.
	- Use `hf_collections_search(owner=...)` for "what collections does this org/user have?" prompts.
	- `hf_collections_search(...)` is for finding/listing collections. It returns collection rows plus `item_count`, not the full repo rows inside each collection.
	- Use `hf_collection_items(collection_id=...)` for "what repos/models/datasets/spaces are in this collection?" prompts.
	- Do not guess raw collection item endpoints such as `/api/collections/.../items`.

	## Helper API
	```py
	await hf_runtime_capabilities(section: str \| None = None)

	await hf_org_overview(organization: str)

	await hf_org_members(
	organization: str,
	return_limit: int \| None = None,
	scan_limit: int \| None = None,
	count_only: bool = False,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_repo_search(
	query: str \| None = None,
	repo_type: str \| None = None,
	repo_types: list[str] \| None = None,
	author: str \| None = None,
	filters: list[str] \| None = None,
	sort: str \| None = None,
	limit: int = 20,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	advanced: dict \| None = None,
	)

	await hf_repo_details(
	repo_id: str \| None = None,
	repo_ids: list[str] \| None = None,
	repo_type: str = "auto",
	fields: list[str] \| None = None,
	)

	await hf_trending(
	repo_type: str = "model",
	limit: int = 20,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_user_summary(
	username: str \| None = None,
	include: list[str] \| None = None,
	sample_limit: int = 10,
	activity_limit: int = 10,
	graph_pro_only: bool \| None = None,
	)

	await hf_user_graph(
	username: str \| None = None,
	relation: str = "followers",
	return_limit: int \| None = None,
	scan_limit: int \| None = None,
	count_only: bool = False,
	pro_only: bool \| None = None,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_repo_likers(
	repo_id: str,
	repo_type: str,
	return_limit: int \| None = None,
	count_only: bool = False,
	pro_only: bool \| None = None,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_user_likes(
	username: str \| None = None,
	repo_types: list[str] \| None = None,
	return_limit: int \| None = None,
	scan_limit: int \| None = None,
	count_only: bool = False,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	sort: str \| None = None,
	ranking_window: int \| None = None,
	)

	await hf_recent_activity(
	feed_type: str \| None = None,
	entity: str \| None = None,
	activity_types: list[str] \| None = None,
	repo_types: list[str] \| None = None,
	return_limit: int \| None = None,
	max_pages: int \| None = None,
	start_cursor: str \| None = None,
	count_only: bool = False,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_repo_discussions(repo_type: str, repo_id: str, limit: int = 20)
	await hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int)

	await hf_collections_search(
	query: str \| None = None,
	owner: str \| None = None,
	return_limit: int = 20,
	count_only: bool = False,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_collection_items(
	collection_id: str,
	repo_types: list[str] \| None = None,
	return_limit: int = 100,
	count_only: bool = False,
	where: dict \| None = None,
	fields: list[str] \| None = None,
	)

	await hf_whoami()
	await call_api(endpoint: str, params: dict \| None = None, method: str = "GET", json_body: dict \| None = None)
	```

	## Minimal patterns
	```py
	# Exact repo details
	info = await hf_repo_details(
	repo_id="black-forest-labs/FLUX.1-dev",
	repo_type="auto",
	fields=["repo_id", "repo_type", "author", "pipeline_tag", "library_name", "likes", "downloads", "repo_url"],
	)
	item = info["item"] or (info["items"][0] if info["items"] else None)
	return {
	"repo_id": item["repo_id"],
	"repo_type": item["repo_type"],
	"author": item["author"],
	"pipeline_tag": item.get("pipeline_tag"),
	"library_name": item.get("library_name"),
	"likes": item.get("likes"),
	"downloads": item.get("downloads"),
	"repo_url": item.get("repo_url"),
	}

	# Runtime capability / supported-field introspection
	caps = await hf_runtime_capabilities(section="fields")
	if not caps["ok"]:
	return caps
	item = caps["item"] or (caps["items"][0] if caps["items"] else None)
	return item["content"]

	# Compact user summary
	summary = await hf_user_summary(
	username="mishig",
	include=["likes", "activity"],
	sample_limit=10,
	activity_limit=10,
	)
	item = summary["item"] or (summary["items"][0] if summary["items"] else None)
	return {
	"total_followers": item["overview"]["followers"],
	"total_following": item["overview"]["following"],
	"latest_activity": item["activity"]["sample"],
	"latest_likes": item["likes"]["sample"],
	}

	# Current user's pro followers and their recent liked repos
	followers = await hf_user_graph(
	relation="followers",
	pro_only=True,
	fields=["username"],
	)
	if not followers["ok"]:
	return followers
	result = {}
	for row in followers["items"]:
	uname = row.get("username")
	if not uname:
	continue
	likes = await hf_user_likes(
	username=uname,
	return_limit=3,
	fields=["repo_id", "repo_type", "liked_at", "repo_url"],
	)
	repos = []
	for item in likes["items"]:
	repo = {}
	for key in ["repo_id", "repo_type", "liked_at", "repo_url"]:
	if item.get(key) is not None:
	repo[key] = item[key]
	if repo:
	repos.append(repo)
	if repos:
	result[uname] = repos
	return result

	# Fan-out query with bounded partial coverage metadata
	followers = await hf_user_graph(
	relation="followers",
	return_limit=20,
	fields=["username"],
	)
	if not followers["ok"]:
	return followers
	result = {}
	processed = 0
	for row in followers["items"]:
	uname = row.get("username")
	if not uname:
	continue
	likes = await hf_user_likes(
	username=uname,
	repo_types=["model"],
	return_limit=3,
	fields=["repo_id", "repo_author", "liked_at"],
	)
	processed += 1
	items = []
	for item in likes["items"]:
	liked = {}
	for key in ["repo_id", "repo_author", "liked_at"]:
	if item.get(key) is not None:
	liked[key] = item[key]
	if liked:
	items.append(liked)
	if items:
	result[uname] = items
	return {
	"results": result,
	"coverage": {
	"partial": bool(followers["meta"].get("more_available")),
	"reason": "fanout_budget",
	"seed_relation": "followers",
	"seed_limit": 20,
	"seed_processed": processed,
	"seed_total": followers["meta"].get("total"),
	"seed_more_available": followers["meta"].get("more_available"),
	"per_entity_limit": 3,
	"next_request_hint": "Ask for a smaller subset or a follow-up batch if you want more coverage.",
	},
	}

	# Popularity-ranked likes with metadata
	likes = await hf_user_likes(
	username="julien-c",
	return_limit=1,
	sort="repoLikes",
	ranking_window=40,
	fields=["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"],
	)
	item = likes["item"] or (likes["items"][0] if likes["items"] else None)
	if item is None:
	return {"error": "No liked repositories found"}
	repo = {}
	for key in ["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"]:
	if item.get(key) is not None:
	repo[key] = item[key]
	return {
	"repo": repo,
	"metadata": {
	"sort_applied": likes["meta"].get("sort_applied"),
	"ranking_window": likes["meta"].get("ranking_window"),
	"ranking_complete": likes["meta"].get("ranking_complete"),
	},
	}

	# Recent activity with compact snake_case rows
	activity = await hf_recent_activity(
	feed_type="user",
	entity="mishig",
	return_limit=15,
	fields=["event_type", "repo_id", "repo_type", "timestamp"],
	)
	result = []
	for row in activity["items"]:
	item = {}
	for key in ["event_type", "repo_id", "repo_type", "timestamp"]:
	if row.get(key) is not None:
	item[key] = row[key]
	if item:
	result.append(item)
	return result

	# Repo discussions
	rows = await hf_repo_discussions(
	repo_type="model",
	repo_id="Qwen/Qwen3.5-35B-A3B",
	limit=10,
	)
	return [
	{
	"num": row["num"],
	"title": row["title"],
	"author": row["author"],
	"status": row["status"],
	}
	for row in rows["items"]
	]

	# Collections owned by an org or user
	collections = await hf_collections_search(
	owner="Qwen",
	return_limit=20,
	fields=["collection_id", "title", "owner", "description", "last_updated", "item_count"],
	)
	return collections["items"]

	# Daily papers via the exact allowed raw endpoint
	resp = await call_api("/api/daily_papers")
	if not resp["ok"]:
	return resp
	rows = []
	for item in resp.get("data") or []:
	row = {}
	if item.get("title") is not None:
	row["title"] = item["title"]
	if item.get("repo_id") is not None:
	row["repo_id"] = item["repo_id"]
	if row:
	rows.append(row)
	return rows

	# Organization repo counts
	org = await hf_org_overview("unsloth")
	item = org["item"] or (org["items"][0] if org["items"] else None)
	return {
	"organization": item["organization"],
	"models": item.get("models"),
	"datasets": item.get("datasets"),
	"spaces": item.get("spaces"),
	}

	# Do any authors of the top trending spaces follow me?
	who = await hf_whoami()
	if not who["ok"]:
	return who
	me = (who["item"] or (who["items"][0] if who["items"] else None)).get("username")
	spaces = await hf_trending(
	repo_type="space",
	limit=20,
	fields=["repo_id", "author", "repo_url"],
	)
	authors = []
	seen = set()
	for row in spaces["items"]:
	author = row.get("author")
	if isinstance(author, str) and author and author not in seen:
	seen.add(author)
	authors.append(author)

	results = []
	processed = 0
	for author in authors[:20]:
	graph = await hf_user_graph(
	username=author,
	relation="following",
	return_limit=200,
	fields=["username"],
	)
	processed += 1
	if not graph["ok"]:
	continue
	if any(item.get("username") == me for item in graph["items"]):
	results.append(author)

	return {
	"results": results,
	"coverage": {
	"partial": False,
	"reason": None,
	"seed_relation": "trending_space_authors",
	"seed_limit": 20,
	"seed_processed": processed,
	"seed_total": len(authors),
	"seed_more_available": False,
	"per_entity_limit": 200,
	},
	}

	# Models inside an org's collections
	collections = await hf_collections_search(
	owner="openai",
	return_limit=20,
	fields=["collection_id", "title"],
	)
	result = {}
	for coll in collections["items"]:
	collection_id = coll.get("collection_id")
	title = coll.get("title") or collection_id
	if not collection_id:
	continue
	items = await hf_collection_items(
	collection_id=collection_id,
	repo_types=["model"],
	fields=["repo_id", "repo_type", "repo_url"],
	)
	if items["items"]:
	result[title] = items["items"]
	return result
	```