Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

evalstate HF Staff commited on 3 days ago

Commit

bcee6cb

verified ·

1 Parent(s): 1932834

Deploy hf-hub-query after Monty prompt/runtime updates

Browse files

Files changed (3) hide show

_monty_codegen_shared.md +41 -2
hf-hub-query.md +1 -1
monty_api_tool_v2.py +3 -3

_monty_codegen_shared.md CHANGED Viewed

@@ -3,10 +3,28 @@
 - Helper functions are already in scope.
 - All helper/API calls are async: always use `await`.
 - `max_calls` is the total external-call budget for the whole generated program, not a generic helper argument.
 - Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
 - `call_api` must receive a raw path starting with `/api/...`; never call helper names through `call_api`.
 - `call_api(...)` returns `{ok, status, url, data, error}`. Always check `resp["ok"]` before reading `resp["data"]`. Do not read `resp["items"]` or `resp["meta"]` directly from `call_api(...)`.
 - Use `call_api(...)` only for endpoint families that do not already have a helper, such as `/api/daily_papers` or tag metadata endpoints.
 - Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
 - Prefer canonical snake_case keys in generated code and in JSON output.
 - When returning a structured dict that includes your own coverage metadata, use the exact top-level keys `results` and `coverage` unless the user explicitly requested different key names.
@@ -134,12 +152,18 @@ Common aliases tolerated in `fields=[...]`:
 - Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
 - For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned `meta` fields explicitly.
 - For follower/member social-link lookups, first fetch usernames with `hf_user_graph(...)` or `hf_org_members(...)`, then fetch profile/social data with `hf_user_summary(username=...)`.
-- For fan-out tasks that require one helper call per follower/member/liker/repo/user, do **not** try to exhaustively process a large seed set in one run.
-- For those fan-out tasks, prefer a bounded seed set plus explicit partial coverage metadata over exhausting `max_calls`.
 - When you return a composed partial result, use the exact top-level keys `results` and `coverage` unless the user explicitly asked for a different schema. Do **not** rename `results` to `items`, `rows`, `liked_models`, or similar.
 - Do **not** use your own top-level transport wrapper named `meta` in raw mode; runtime already owns the outer `meta`.
 - Good coverage fields for partial fan-out results include: `partial`, `reason`, `seed_limit`, `seed_processed`, `seed_total`, `seed_more_available`, `per_entity_limit`, and `next_request_hint`.
 - If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with `Max API calls exceeded`.
 - Use `hf_recent_activity(...)` for activity feeds instead of raw `call_api('/api/recent-activity', ...)`.
 - Use `hf_repo_search(author=..., repo_type="space", ...)` for Spaces by author; there is no separate spaces-by-author helper.
 - Use `hf_collections_search(owner=...)` for "what collections does this org/user have?" prompts.
@@ -442,6 +466,21 @@ collections = await hf_collections_search(
 )
 return collections["items"]
 # Organization repo counts
 org = await hf_org_overview("unsloth")
 item = org["item"] or (org["items"][0] if org["items"] else None)

 - Helper functions are already in scope.
 - All helper/API calls are async: always use `await`.
 - `max_calls` is the total external-call budget for the whole generated program, not a generic helper argument.
+- The outer wrapper is an exact contract. Use this exact skeleton and only change the body:
+```py
+async def solve(query, max_calls):
+    ...
+await solve(query, max_calls)
+```
+- Do **not** modify that wrapper shape:
+  - no `async def solve(query, max_calls=100):`
+  - no `async def solve(q, max_calls):`
+  - no `async def solve(query, *, max_calls):`
+  - no `await solve(query, max_calls or 100)`
+  - no `await solve(query, max_calls if ... else ...)`
+  - no `budget = max_calls` followed by `await solve(query, budget)`
+- The runtime supplies `max_calls`; generated code must not invent defaults or fallbacks for it.
 - Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
 - `call_api` must receive a raw path starting with `/api/...`; never call helper names through `call_api`.
+- Raw `call_api(...)` endpoints must match the runtime allowlist exactly. Do **not** invent hyphen/underscore variants or guessed path shapes.
 - `call_api(...)` returns `{ok, status, url, data, error}`. Always check `resp["ok"]` before reading `resp["data"]`. Do not read `resp["items"]` or `resp["meta"]` directly from `call_api(...)`.
+- `call_api(...)` only accepts `endpoint`, `params`, `method`, and `json_body`. Do not guess extra kwargs.
 - Use `call_api(...)` only for endpoint families that do not already have a helper, such as `/api/daily_papers` or tag metadata endpoints.
+- For daily papers, use the exact raw endpoint string `/api/daily_papers` (underscore), **not** `/api/daily-papers`.
 - Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
 - Prefer canonical snake_case keys in generated code and in JSON output.
 - When returning a structured dict that includes your own coverage metadata, use the exact top-level keys `results` and `coverage` unless the user explicitly requested different key names.
 - Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
 - For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned `meta` fields explicitly.
 - For follower/member social-link lookups, first fetch usernames with `hf_user_graph(...)` or `hf_org_members(...)`, then fetch profile/social data with `hf_user_summary(username=...)`.
+- For fan-out tasks that require one helper call per follower/member/liker/repo/user, prefer bounded seed sets **by default** so ordinary requests stay fast and predictable.
+- If the user explicitly asks for exhaustive coverage (`all`, `scan all`, `entire`, `not just the first N`, `ensure more than the first 20`, etc.), do **not** silently cap the seed at a small sample such as 20 or 50.
+- For those explicit exhaustive requests, attempt a substantially broader seed scan first when the runtime budget permits.
+- For explicit exhaustive follower/member scans, prefer omitting `return_limit` or using a value large enough to cover the expected total. Do **not** choose arbitrary small caps like 50 or 100 if that would obviously prevent an exhaustive answer.
+- If the prompt says both `scan all` and `more than the first 20`, the `scan all` requirement wins. Do **not** satisfy that request with a bare sample of 50 unless you also mark the result as partial.
+- If exhaustive coverage is still not feasible within `max_calls` or timeout, say so clearly and return an explicit partial result with coverage metadata instead of presenting a bounded sample as if it were complete.
 - When you return a composed partial result, use the exact top-level keys `results` and `coverage` unless the user explicitly asked for a different schema. Do **not** rename `results` to `items`, `rows`, `liked_models`, or similar.
 - Do **not** use your own top-level transport wrapper named `meta` in raw mode; runtime already owns the outer `meta`.
 - Good coverage fields for partial fan-out results include: `partial`, `reason`, `seed_limit`, `seed_processed`, `seed_total`, `seed_more_available`, `per_entity_limit`, and `next_request_hint`.
 - If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with `Max API calls exceeded`.
+- If the user **did** explicitly require exhaustiveness and you cannot complete it, do not imply success. Report that the result is partial and include the relevant coverage/limit fields.
+- For explicit exhaustive follower/member prompts, if `meta.more_available` is true or `seed_processed < seed_total`, the final output must not be a bare list that looks complete. Include explicit partial/coverage information.
 - Use `hf_recent_activity(...)` for activity feeds instead of raw `call_api('/api/recent-activity', ...)`.
 - Use `hf_repo_search(author=..., repo_type="space", ...)` for Spaces by author; there is no separate spaces-by-author helper.
 - Use `hf_collections_search(owner=...)` for "what collections does this org/user have?" prompts.
 )
 return collections["items"]
+# Daily papers via the exact allowed raw endpoint
+resp = await call_api("/api/daily_papers")
+if not resp["ok"]:
+    return resp
+rows = []
+for item in resp.get("data") or []:
+    row = {}
+    if item.get("title") is not None:
+        row["title"] = item["title"]
+    if item.get("repo_id") is not None:
+        row["repo_id"] = item["repo_id"]
+    if row:
+        rows.append(row)
+return rows
 # Organization repo counts
 org = await hf_org_overview("unsloth")
 item = org["item"] or (org["items"][0] if org["items"] else None)

hf-hub-query.md CHANGED Viewed

@@ -4,7 +4,7 @@ name: hf_hub_query
 model: hf.openai/gpt-oss-120b:cerebras
 use_history: false
 default: true
-description: "Active natural-language Hugging Face Hub navigator. Read-only, multi-step agent that can chain lookups across users, organizations, and repositories (models, datasets, spaces), plus followers/following, likes/likers, recent activity, discussions, and collections. Aware of current User identity. Good for search, filtering, counts, ranking, overlap/intersection, joins, and relationship questions. Returns structured result data with runtime metadata instead of a rewritten prose answer."
 shell: false
 skills: []
 function_tools:

 model: hf.openai/gpt-oss-120b:cerebras
 use_history: false
 default: true
+description: "Active natural-language Hugging Face Hub navigator. Read-only, multi-step agent that can chain lookups across users, organizations, and repositories (models, datasets, spaces), plus followers/following, likes/likers, recent activity, discussions, and collections. Aware of current user identity. Good for search, filtering, counts, ranking, overlap/intersection, joins, relationship questions, and broader fan-out scans when explicitly requested. Returns structured result data with runtime metadata and preserves partial-coverage caveats instead of silently treating bounded scans as complete."
 shell: false
 skills: []
 function_tools:

monty_api_tool_v2.py CHANGED Viewed

@@ -32,9 +32,9 @@ from huggingface_hub.hf_api import DatasetSort_T, ModelSort_T, SpaceSort_T
 # - max_calls: hard cap on the total number of external helper/API calls a single
 #   generated program may make in one run.
 # - timeout_sec: wall-clock timeout for the full Monty execution.
-DEFAULT_TIMEOUT_SEC = 45  # Default end-to-end timeout for one Monty run.
-DEFAULT_MAX_CALLS = 100  # Default external-call budget exposed to callers.
-MAX_CALLS_LIMIT = 100  # Absolute max external-call budget accepted by the runtime.
 INTERNAL_STRICT_MODE = False
 # Result-size vocabulary used throughout helper metadata:

 # - max_calls: hard cap on the total number of external helper/API calls a single
 #   generated program may make in one run.
 # - timeout_sec: wall-clock timeout for the full Monty execution.
+DEFAULT_TIMEOUT_SEC = 90  # Default end-to-end timeout for one Monty run.
+DEFAULT_MAX_CALLS = 400  # Default external-call budget exposed to callers.
+MAX_CALLS_LIMIT = 400  # Absolute max external-call budget accepted by the runtime.
 INTERNAL_STRICT_MODE = False
 # Result-size vocabulary used throughout helper metadata: