hf-hub-query / _monty_codegen_shared.md
evalstate's picture
evalstate HF Staff
Deploy hf-hub-query with runtime capabilities helper and budget prompt fix
c830f69 verified
## Runtime rules for generated code
- No imports.
- Helper functions are already in scope.
- All helper/API calls are async: always use `await`.
- `max_calls` is the total external-call budget for the whole generated program, not a generic helper argument.
- The outer wrapper is an exact contract. Use this exact skeleton and only change the body:
```py
async def solve(query, max_calls):
...
await solve(query, max_calls)
```
- Do **not** modify that wrapper shape:
- no `async def solve(query, max_calls=100):`
- no `async def solve(q, max_calls):`
- no `async def solve(query, *, max_calls):`
- no `await solve(query, max_calls or 100)`
- no `await solve(query, max_calls if ... else ...)`
- no `budget = max_calls` followed by `await solve(query, budget)`
- The runtime supplies `max_calls`; generated code must not invent defaults or fallbacks for it.
- At the tool-call layer, normally omit `max_calls` and `timeout_sec` so the runtime defaults apply. Do **not** invent small explicit tool-call budgets like `10` or `20` for ordinary requests.
- Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
- `call_api` must receive a raw path starting with `/api/...`; never call helper names through `call_api`.
- Raw `call_api(...)` endpoints must match the runtime allowlist exactly. Do **not** invent hyphen/underscore variants or guessed path shapes.
- `call_api(...)` returns `{ok, status, url, data, error}`. Always check `resp["ok"]` before reading `resp["data"]`. Do not read `resp["items"]` or `resp["meta"]` directly from `call_api(...)`.
- `call_api(...)` only accepts `endpoint`, `params`, `method`, and `json_body`. Do not guess extra kwargs.
- Use `call_api(...)` only for endpoint families that do not already have a helper, such as `/api/daily_papers` or tag metadata endpoints.
- For daily papers, use the exact raw endpoint string `/api/daily_papers` (underscore), **not** `/api/daily-papers`.
- For questions about supported helpers, fields, limits, raw API affordances, or runtime capabilities, use `hf_runtime_capabilities(...)` instead of hand-authoring a static answer from memory.
- Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
- Prefer canonical snake_case keys in generated code and in JSON output.
- When returning a structured dict that includes your own coverage metadata, use the exact top-level keys `results` and `coverage` unless the user explicitly requested different key names.
- Omit unavailable optional fields instead of emitting `null` placeholders unless the user explicitly asked for a fixed schema with nulls.
- If the user asks for specific fields or says "return only", return exactly that final shape from `solve(...)`.
- For current-user prompts (`my`, `me`), use helpers with `username=None` first. Only ask for identity if that fails.
- When a current-user helper response has `ok=false`, return that helper response directly instead of flattening it into an empty result.
## Common helper signature traps
These are high-priority rules. Do not guess helper arguments.
- `hf_repo_search(...)` uses `limit`, **not** `return_limit`, and does **not** accept `count_only`.
- `hf_trending(...)` uses `limit`, **not** `return_limit`.
- `hf_repo_discussions(...)` uses `limit`, **not** `return_limit`.
- `hf_user_graph(...)`, `hf_user_likes(...)`, `hf_org_members(...)`, `hf_recent_activity(...)`, and `hf_collection_items(...)` use `return_limit`.
- For "how many models/datasets/spaces does org/user X have?" prefer `hf_org_overview(...)` or `hf_user_summary(...)["item"]["overview"]` instead of trying to count with `hf_repo_search(...)`.
- Never invent helper args such as `count_only=True` for helpers that do not document it.
## Helper result shape
All helpers return:
```py
{
"ok": bool,
"item": dict | None,
"items": list[dict],
"meta": dict,
"error": str | None,
}
```
Rules:
- `items` is the canonical list field.
- `item` is only a singleton convenience.
- `meta` contains helper-owned execution, coverage, and limit information.
- For metadata-oriented prompts, return the relevant `meta` fields instead of inferring coverage from list length alone.
- For bounded list/sample helpers in raw mode, returning the helper envelope directly preserves helper-owned `meta` fields.
## Routing guide
### Runtime self-description
- Supported fields / helper signatures / limits / raw API affordances β†’ `hf_runtime_capabilities(...)`
### Repo questions
- Exact `owner/name` details β†’ `hf_repo_details(repo_type="auto", ...)`
- Search/discovery/list/top repos β†’ `hf_repo_search(...)`
- True trending requests β†’ `hf_trending(...)`
- Repo discussions β†’ `hf_repo_discussions(...)`
- Specific discussion details / latest comment text β†’ `hf_repo_discussion_details(...)`
- Users who liked a specific repo β†’ `hf_repo_likers(...)`
### User questions
- Profile / overview / "tell me about user X" β†’ `hf_user_summary(...)`
- Followers / following / graph samples β†’ `hf_user_graph(...)`
- Repos a user liked β†’ `hf_user_likes(...)`
- Recent actions / activity feed β†’ `hf_recent_activity(feed_type="user", entity=...)`
### Organization questions
- Organization details and counts β†’ `hf_org_overview(...)`
- Organization members β†’ `hf_org_members(...)`
- Organization repos β†’ `hf_repo_search(author="<org>", repo_types=[...])`
- Organization or user collections β†’ `hf_collections_search(owner="<org-or-user>", ...)`
- Repos inside a known collection β†’ `hf_collection_items(collection_id=...)`
### Direction reminders
- `hf_user_likes(...)` = **user β†’ repos**
- `hf_repo_likers(...)` = **repo β†’ users**
- `hf_user_graph(...)` = **user/org β†’ followers/following**
- If the author/org is already known, start with `hf_repo_search(author=...)` instead of semantic search.
- For "most popular repo a user liked", use `hf_user_likes(sort="repoLikes" | "repoDownloads", ranking_window=40)` instead of fetching recent likes and re-ranking locally.
## Common row keys
Use these canonical keys unless the user explicitly wants different names.
- Repo rows: `repo_id`, `repo_type`, `title`, `author`, `likes`, `downloads`, `created_at`, `last_modified`, `pipeline_tag`, `library_name`, `repo_url`, `tags`
- User graph/member rows: `username`, `fullname`, `isPro`, `role`, `type`
- Activity rows: `event_type`, `repo_id`, `repo_type`, `timestamp`
- Collection rows: `collection_id`, `slug`, `title`, `owner`, `owner_type`, `description`, `last_updated`, `item_count`
- `hf_user_summary(...)["item"]["overview"]`: `username`, `fullname`, `bio`, `websiteUrl`, `twitter`, `github`, `linkedin`, `bluesky`, `followers`, `following`, `likes`, `isPro`
Common aliases in `fields=[...]` are tolerated by the runtime, but prefer the canonical names above in generated code.
## Common repo fields
- `repo_id`
- `repo_type`
- `title`
- `author`
- `likes`
- `downloads`
- `created_at`
- `last_modified`
- `pipeline_tag`
- `repo_url`
- model: `library_name`
- dataset: `description`, `paperswithcode_id`
- space: `sdk`, `models`, `datasets`, `subdomain`
Common aliases tolerated in `fields=[...]`:
- `repoId` β†’ `repo_id`
- `repoType` β†’ `repo_type`
- `repoUrl` β†’ `repo_url`
- `createdAt` β†’ `created_at`
- `lastModified` β†’ `last_modified`
## Common collection fields
- `collection_id`
- `slug`
- `title`
- `owner`
- `owner_type`
- `description`
- `last_updated`
- `item_count`
Common aliases tolerated in `fields=[...]`:
- `collectionId` β†’ `collection_id`
- `lastUpdated` β†’ `last_updated`
- `ownerType` β†’ `owner_type`
- `itemCount` β†’ `item_count`
- `author` β†’ `owner`
## High-signal usage notes
- `hf_repo_search(...)` defaults to models if no repo type is specified. For prompts like "what repos does <author/org> have", search across `repo_types=["model", "dataset", "space"]` unless the user asked for one type.
- `hf_trending(...)` returns the Hub's ordered trending list. Use `trending_rank` / ordering, not a fabricated numeric trending score.
- If the user explicitly asks for trending scores, say the upstream endpoint does not expose them and return the ordered repos instead.
- `hf_user_summary(...)` is the fastest way to answer common profile prompts. Read profile/social fields from `summary["item"]["overview"]`.
- For "how many models/datasets/spaces does user/org X have?" prompts, prefer the overview helpers (`hf_user_summary(...)["item"]["overview"]` or `hf_org_overview(...)`) over `hf_repo_search(..., limit=1)` or invented `count_only` args.
- Use `hf_whoami()` when you need the explicit current username for joins, comparisons, or output labeling.
- For overlap/comparison/ranking tasks, fetch a broad enough working set first and compute locally in code.
- Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
- For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned `meta` fields explicitly.
- For follower/member social-link lookups, first fetch usernames with `hf_user_graph(...)` or `hf_org_members(...)`, then fetch profile/social data with `hf_user_summary(username=...)`.
- For fan-out tasks that require one helper call per follower/member/liker/repo/user, prefer bounded seed sets **by default** so ordinary requests stay fast and predictable.
- If the user explicitly asks for exhaustive coverage (`all`, `scan all`, `entire`, `not just the first N`, `ensure more than the first 20`, etc.), do **not** silently cap the seed at a small sample such as 20 or 50.
- For those explicit exhaustive requests, attempt a substantially broader seed scan first when the runtime budget permits.
- For explicit exhaustive follower/member scans, prefer omitting `return_limit` or using a value large enough to cover the expected total. Do **not** choose arbitrary small caps like 50 or 100 if that would obviously prevent an exhaustive answer.
- If the prompt says both `scan all` and `more than the first 20`, the `scan all` requirement wins. Do **not** satisfy that request with a bare sample of 50 unless you also mark the result as partial.
- If exhaustive coverage is still not feasible within `max_calls` or timeout, say so clearly and return an explicit partial result with coverage metadata instead of presenting a bounded sample as if it were complete.
- When you return a composed partial result, use the exact top-level keys `results` and `coverage` unless the user explicitly asked for a different schema. Do **not** rename `results` to `items`, `rows`, `liked_models`, or similar.
- Do **not** use your own top-level transport wrapper named `meta` in raw mode; runtime already owns the outer `meta`.
- Good coverage fields for partial fan-out results include: `partial`, `reason`, `seed_limit`, `seed_processed`, `seed_total`, `seed_more_available`, `per_entity_limit`, and `next_request_hint`.
- If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with `Max API calls exceeded`.
- If the user **did** explicitly require exhaustiveness and you cannot complete it, do not imply success. Report that the result is partial and include the relevant coverage/limit fields.
- For explicit exhaustive follower/member prompts, if `meta.more_available` is true or `seed_processed < seed_total`, the final output must not be a bare list that looks complete. Include explicit partial/coverage information.
- Use `hf_recent_activity(...)` for activity feeds instead of raw `call_api('/api/recent-activity', ...)`.
- Use `hf_repo_search(author=..., repo_type="space", ...)` for Spaces by author; there is no separate spaces-by-author helper.
- Use `hf_collections_search(owner=...)` for "what collections does this org/user have?" prompts.
- `hf_collections_search(...)` is for finding/listing collections. It returns collection rows plus `item_count`, not the full repo rows inside each collection.
- Use `hf_collection_items(collection_id=...)` for "what repos/models/datasets/spaces are in this collection?" prompts.
- Do **not** guess raw collection item endpoints such as `/api/collections/.../items`.
## Helper API
```py
await hf_runtime_capabilities(section: str | None = None)
await hf_org_overview(organization: str)
await hf_org_members(
organization: str,
return_limit: int | None = None,
scan_limit: int | None = None,
count_only: bool = False,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_repo_search(
query: str | None = None,
repo_type: str | None = None,
repo_types: list[str] | None = None,
author: str | None = None,
filters: list[str] | None = None,
sort: str | None = None,
limit: int = 20,
where: dict | None = None,
fields: list[str] | None = None,
advanced: dict | None = None,
)
await hf_repo_details(
repo_id: str | None = None,
repo_ids: list[str] | None = None,
repo_type: str = "auto",
fields: list[str] | None = None,
)
await hf_trending(
repo_type: str = "model",
limit: int = 20,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_user_summary(
username: str | None = None,
include: list[str] | None = None,
sample_limit: int = 10,
activity_limit: int = 10,
graph_pro_only: bool | None = None,
)
await hf_user_graph(
username: str | None = None,
relation: str = "followers",
return_limit: int | None = None,
scan_limit: int | None = None,
count_only: bool = False,
pro_only: bool | None = None,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_repo_likers(
repo_id: str,
repo_type: str,
return_limit: int | None = None,
count_only: bool = False,
pro_only: bool | None = None,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_user_likes(
username: str | None = None,
repo_types: list[str] | None = None,
return_limit: int | None = None,
scan_limit: int | None = None,
count_only: bool = False,
where: dict | None = None,
fields: list[str] | None = None,
sort: str | None = None,
ranking_window: int | None = None,
)
await hf_recent_activity(
feed_type: str | None = None,
entity: str | None = None,
activity_types: list[str] | None = None,
repo_types: list[str] | None = None,
return_limit: int | None = None,
max_pages: int | None = None,
start_cursor: str | None = None,
count_only: bool = False,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_repo_discussions(repo_type: str, repo_id: str, limit: int = 20)
await hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int)
await hf_collections_search(
query: str | None = None,
owner: str | None = None,
return_limit: int = 20,
count_only: bool = False,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_collection_items(
collection_id: str,
repo_types: list[str] | None = None,
return_limit: int = 100,
count_only: bool = False,
where: dict | None = None,
fields: list[str] | None = None,
)
await hf_whoami()
await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)
```
## Minimal patterns
```py
# Exact repo details
info = await hf_repo_details(
repo_id="black-forest-labs/FLUX.1-dev",
repo_type="auto",
fields=["repo_id", "repo_type", "author", "pipeline_tag", "library_name", "likes", "downloads", "repo_url"],
)
item = info["item"] or (info["items"][0] if info["items"] else None)
return {
"repo_id": item["repo_id"],
"repo_type": item["repo_type"],
"author": item["author"],
"pipeline_tag": item.get("pipeline_tag"),
"library_name": item.get("library_name"),
"likes": item.get("likes"),
"downloads": item.get("downloads"),
"repo_url": item.get("repo_url"),
}
# Runtime capability / supported-field introspection
caps = await hf_runtime_capabilities(section="fields")
if not caps["ok"]:
return caps
item = caps["item"] or (caps["items"][0] if caps["items"] else None)
return item["content"]
# Compact user summary
summary = await hf_user_summary(
username="mishig",
include=["likes", "activity"],
sample_limit=10,
activity_limit=10,
)
item = summary["item"] or (summary["items"][0] if summary["items"] else None)
return {
"total_followers": item["overview"]["followers"],
"total_following": item["overview"]["following"],
"latest_activity": item["activity"]["sample"],
"latest_likes": item["likes"]["sample"],
}
# Current user's pro followers and their recent liked repos
followers = await hf_user_graph(
relation="followers",
pro_only=True,
fields=["username"],
)
if not followers["ok"]:
return followers
result = {}
for row in followers["items"]:
uname = row.get("username")
if not uname:
continue
likes = await hf_user_likes(
username=uname,
return_limit=3,
fields=["repo_id", "repo_type", "liked_at", "repo_url"],
)
repos = []
for item in likes["items"]:
repo = {}
for key in ["repo_id", "repo_type", "liked_at", "repo_url"]:
if item.get(key) is not None:
repo[key] = item[key]
if repo:
repos.append(repo)
if repos:
result[uname] = repos
return result
# Fan-out query with bounded partial coverage metadata
followers = await hf_user_graph(
relation="followers",
return_limit=20,
fields=["username"],
)
if not followers["ok"]:
return followers
result = {}
processed = 0
for row in followers["items"]:
uname = row.get("username")
if not uname:
continue
likes = await hf_user_likes(
username=uname,
repo_types=["model"],
return_limit=3,
fields=["repo_id", "repo_author", "liked_at"],
)
processed += 1
items = []
for item in likes["items"]:
liked = {}
for key in ["repo_id", "repo_author", "liked_at"]:
if item.get(key) is not None:
liked[key] = item[key]
if liked:
items.append(liked)
if items:
result[uname] = items
return {
"results": result,
"coverage": {
"partial": bool(followers["meta"].get("more_available")),
"reason": "fanout_budget",
"seed_relation": "followers",
"seed_limit": 20,
"seed_processed": processed,
"seed_total": followers["meta"].get("total"),
"seed_more_available": followers["meta"].get("more_available"),
"per_entity_limit": 3,
"next_request_hint": "Ask for a smaller subset or a follow-up batch if you want more coverage.",
},
}
# Popularity-ranked likes with metadata
likes = await hf_user_likes(
username="julien-c",
return_limit=1,
sort="repoLikes",
ranking_window=40,
fields=["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"],
)
item = likes["item"] or (likes["items"][0] if likes["items"] else None)
if item is None:
return {"error": "No liked repositories found"}
repo = {}
for key in ["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"]:
if item.get(key) is not None:
repo[key] = item[key]
return {
"repo": repo,
"metadata": {
"sort_applied": likes["meta"].get("sort_applied"),
"ranking_window": likes["meta"].get("ranking_window"),
"ranking_complete": likes["meta"].get("ranking_complete"),
},
}
# Recent activity with compact snake_case rows
activity = await hf_recent_activity(
feed_type="user",
entity="mishig",
return_limit=15,
fields=["event_type", "repo_id", "repo_type", "timestamp"],
)
result = []
for row in activity["items"]:
item = {}
for key in ["event_type", "repo_id", "repo_type", "timestamp"]:
if row.get(key) is not None:
item[key] = row[key]
if item:
result.append(item)
return result
# Repo discussions
rows = await hf_repo_discussions(
repo_type="model",
repo_id="Qwen/Qwen3.5-35B-A3B",
limit=10,
)
return [
{
"num": row["num"],
"title": row["title"],
"author": row["author"],
"status": row["status"],
}
for row in rows["items"]
]
# Collections owned by an org or user
collections = await hf_collections_search(
owner="Qwen",
return_limit=20,
fields=["collection_id", "title", "owner", "description", "last_updated", "item_count"],
)
return collections["items"]
# Daily papers via the exact allowed raw endpoint
resp = await call_api("/api/daily_papers")
if not resp["ok"]:
return resp
rows = []
for item in resp.get("data") or []:
row = {}
if item.get("title") is not None:
row["title"] = item["title"]
if item.get("repo_id") is not None:
row["repo_id"] = item["repo_id"]
if row:
rows.append(row)
return rows
# Organization repo counts
org = await hf_org_overview("unsloth")
item = org["item"] or (org["items"][0] if org["items"] else None)
return {
"organization": item["organization"],
"models": item.get("models"),
"datasets": item.get("datasets"),
"spaces": item.get("spaces"),
}
# Do any authors of the top trending spaces follow me?
who = await hf_whoami()
if not who["ok"]:
return who
me = (who["item"] or (who["items"][0] if who["items"] else None)).get("username")
spaces = await hf_trending(
repo_type="space",
limit=20,
fields=["repo_id", "author", "repo_url"],
)
authors = []
seen = set()
for row in spaces["items"]:
author = row.get("author")
if isinstance(author, str) and author and author not in seen:
seen.add(author)
authors.append(author)
results = []
processed = 0
for author in authors[:20]:
graph = await hf_user_graph(
username=author,
relation="following",
return_limit=200,
fields=["username"],
)
processed += 1
if not graph["ok"]:
continue
if any(item.get("username") == me for item in graph["items"]):
results.append(author)
return {
"results": results,
"coverage": {
"partial": False,
"reason": None,
"seed_relation": "trending_space_authors",
"seed_limit": 20,
"seed_processed": processed,
"seed_total": len(authors),
"seed_more_available": False,
"per_entity_limit": 200,
},
}
# Models inside an org's collections
collections = await hf_collections_search(
owner="openai",
return_limit=20,
fields=["collection_id", "title"],
)
result = {}
for coll in collections["items"]:
collection_id = coll.get("collection_id")
title = coll.get("title") or collection_id
if not collection_id:
continue
items = await hf_collection_items(
collection_id=collection_id,
repo_types=["model"],
fields=["repo_id", "repo_type", "repo_url"],
)
if items["items"]:
result[title] = items["items"]
return result
```