Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

evalstate HF Staff commited on Mar 9

Commit

9404142

verified ·

1 Parent(s): 4512510

Deploy hf-hub-query bundle for f8275ae

Browse files

Files changed (2) hide show

_monty_codegen_shared.md +23 -0
monty_api_tool_v2.py +47 -5

_monty_codegen_shared.md CHANGED Viewed

@@ -180,6 +180,7 @@ await hf_repo_discussion_details(
 await hf_collections_search(
   query: str | None = None,
   return_limit: int = 20,
   count_only: bool = False,
   where: dict | None = None,
@@ -254,6 +255,26 @@ Common aliases tolerated in `fields=[...]` include:
 When returning compact repo objects, omit unavailable optional fields instead of
 emitting `null` placeholders unless the user explicitly asked for a fixed schema.
 ## Common user overview fields
 `hf_user_summary(... )["item"]["overview"]` commonly includes:
 - `username`
@@ -302,6 +323,7 @@ Choose the helper based on the **subject of the question** and the **smallest he
 - Organization details / counts → `hf_org_overview(...)`
 - Organization members → `hf_org_members(...)`
 - Organization repos → `hf_repo_search(author="<org>", repo_types=["model", "dataset", "space"])`
 ### Relationship direction matters
 - `hf_user_likes(...)` = **user → repos**
@@ -335,6 +357,7 @@ Pick the helper that already matches the direction of the question instead of tr
 - For "my/me" prompts, prefer current-user forms first: `hf_user_summary(username=None)`, `hf_user_graph(username=None, ...)`, and `hf_user_likes(username=None, ...)`. Use `hf_whoami()` when you need the resolved username explicitly.
 - Use `hf_org_overview(...)` for organization details like display name, followers, and member count.
 - Use `hf_org_members(...)` for organization member lists and counts. Member rows use `username`, `fullname`, `isPro`, and `role`; common aliases like `login`, `name`, and `is_pro` are tolerated in `fields=[...]`.
 - Use `hf_user_graph(...)` for follower/following lists, counts, and filtered graph samples. Prefer `relation=` over trying undocumented helper names.
 - Use `hf_repo_likers(...)` for "who liked this repo?" prompts. It returns liker rows for a specific model, dataset, or space; pass `repo_type` explicitly.
 - For overlap/comparison/ranking tasks over followers, org members, likes, or activity, do not use small manual `return_limit` values like 10/20/50 unless the user explicitly asked for a sample. Use the helper default or a clearly high bound for the intermediate analysis, then keep only the final displayed result compact.

 await hf_collections_search(
   query: str | None = None,
+  owner: str | None = None,
   return_limit: int = 20,
   count_only: bool = False,
   where: dict | None = None,
 When returning compact repo objects, omit unavailable optional fields instead of
 emitting `null` placeholders unless the user explicitly asked for a fixed schema.
+## Common collection fields
+Collection search rows commonly include:
+- `collection_id`
+- `slug`
+- `title`
+- `owner`
+- `owner_type`
+- `description`
+- `gating`
+- `last_updated`
+- `item_count`
+For collection helpers, prefer the canonical names above in generated code and in `fields=[...]`.
+Common aliases tolerated in `fields=[...]` include:
+- `collectionId` → `collection_id`
+- `lastUpdated` → `last_updated`
+- `ownerType` → `owner_type`
+- `itemCount` → `item_count`
+- `author` → `owner`
 ## Common user overview fields
 `hf_user_summary(... )["item"]["overview"]` commonly includes:
 - `username`
 - Organization details / counts → `hf_org_overview(...)`
 - Organization members → `hf_org_members(...)`
 - Organization repos → `hf_repo_search(author="<org>", repo_types=["model", "dataset", "space"])`
+- Organization/user collections → `hf_collections_search(owner="<org-or-user>", ...)`
 ### Relationship direction matters
 - `hf_user_likes(...)` = **user → repos**
 - For "my/me" prompts, prefer current-user forms first: `hf_user_summary(username=None)`, `hf_user_graph(username=None, ...)`, and `hf_user_likes(username=None, ...)`. Use `hf_whoami()` when you need the resolved username explicitly.
 - Use `hf_org_overview(...)` for organization details like display name, followers, and member count.
 - Use `hf_org_members(...)` for organization member lists and counts. Member rows use `username`, `fullname`, `isPro`, and `role`; common aliases like `login`, `name`, and `is_pro` are tolerated in `fields=[...]`.
+- Use `hf_collections_search(...)` for collection search/listing questions. For "what collections does this org/user have?" prompts, pass `owner="<org-or-user>"` so the helper seeds query search and then applies an exact owner filter locally. Prefer fields like `collection_id`, `title`, `owner`, `description`, `last_updated`, and `item_count`.
 - Use `hf_user_graph(...)` for follower/following lists, counts, and filtered graph samples. Prefer `relation=` over trying undocumented helper names.
 - Use `hf_repo_likers(...)` for "who liked this repo?" prompts. It returns liker rows for a specific model, dataset, or space; pass `repo_type` explicitly.
 - For overlap/comparison/ranking tasks over followers, org members, likes, or activity, do not use small manual `return_limit` values like 10/20/50 unless the user explicitly asked for a sample. Use the helper default or a clearly high bound for the intermediate analysis, then keep only the final displayed result compact.

monty_api_tool_v2.py CHANGED Viewed

@@ -143,6 +143,14 @@ _REPO_FIELD_ALIASES: dict[str, str] = {
     "paperswithcodeid": "paperswithcode_id",
 }
 # Extra hf_repo_search kwargs intentionally supported as pass-through to
 # huggingface_hub.HfApi.list_models/list_datasets/list_spaces.
 # (Generic args like `query/search/sort/author/limit` are handled directly in
@@ -1311,6 +1319,9 @@ async def _run_with_monty(
     def _project_repo_items(items: list[dict[str, Any]], fields: list[str] | None) -> list[dict[str, Any]]:
         return _project_items(items, fields, aliases=_REPO_FIELD_ALIASES)
     def _project_user_items(items: list[dict[str, Any]], fields: list[str] | None) -> list[dict[str, Any]]:
         return _project_items(items, fields, aliases=_USER_FIELD_ALIASES)
@@ -3146,6 +3157,7 @@ async def _run_with_monty(
     async def hf_collections_search(
         query: str | None = None,
         return_limit: int = 20,
         count_only: bool = False,
         where: dict[str, Any] | None = None,
@@ -3159,13 +3171,24 @@ async def _run_with_monty(
             return_limit = 0
         lim = _clamp_int(return_limit, default=default_return, minimum=0, maximum=max_return)
-        fetch_lim = max_return if lim == 0 else lim
         term = str(query or "").strip()
         if not term:
-            return _helper_error(start_calls=start_calls, source="/api/collections", error="query is required")
-        resp = _host_raw_call("/api/collections", params={"q": term, "limit": fetch_lim})
         if not resp.get("ok"):
             return _helper_error(
                 start_calls=start_calls,
@@ -3181,12 +3204,30 @@ async def _run_with_monty(
             owner = _author_from_any(row.get("owner")) or _author_from_any(row.get("ownerData"))
             if not owner and isinstance(row.get("slug"), str) and "/" in str(row.get("slug")):
                 owner = str(row.get("slug")).split("/", 1)[0]
-            items.append({"slug": row.get("slug"), "title": row.get("title"), "owner": owner})
         items = _apply_where(items, where)
         total_matched = len(items)
         items = items[:lim]
-        items = _project_items(items, fields)
         truncated = (lim > 0 and total_matched > lim) or (lim == 0 and len(payload) >= fetch_lim)
         return _helper_success(
@@ -3202,6 +3243,7 @@ async def _run_with_monty(
             truncated=truncated,
             complete=not truncated,
             query=term,
         )
     m = pydantic_monty.Monty(

     "paperswithcodeid": "paperswithcode_id",
 }
+_COLLECTION_FIELD_ALIASES: dict[str, str] = {
+    "collectionid": "collection_id",
+    "lastupdated": "last_updated",
+    "ownertype": "owner_type",
+    "itemcount": "item_count",
+    "author": "owner",
+}
 # Extra hf_repo_search kwargs intentionally supported as pass-through to
 # huggingface_hub.HfApi.list_models/list_datasets/list_spaces.
 # (Generic args like `query/search/sort/author/limit` are handled directly in
     def _project_repo_items(items: list[dict[str, Any]], fields: list[str] | None) -> list[dict[str, Any]]:
         return _project_items(items, fields, aliases=_REPO_FIELD_ALIASES)
+    def _project_collection_items(items: list[dict[str, Any]], fields: list[str] | None) -> list[dict[str, Any]]:
+        return _project_items(items, fields, aliases=_COLLECTION_FIELD_ALIASES)
     def _project_user_items(items: list[dict[str, Any]], fields: list[str] | None) -> list[dict[str, Any]]:
         return _project_items(items, fields, aliases=_USER_FIELD_ALIASES)
     async def hf_collections_search(
         query: str | None = None,
+        owner: str | None = None,
         return_limit: int = 20,
         count_only: bool = False,
         where: dict[str, Any] | None = None,
             return_limit = 0
         lim = _clamp_int(return_limit, default=default_return, minimum=0, maximum=max_return)
+        owner_clean = str(owner or "").strip() or None
+        fetch_lim = max_return if lim == 0 or owner_clean else lim
+        if owner_clean:
+            fetch_lim = min(fetch_lim, 100)
         term = str(query or "").strip()
+        if not term and owner_clean:
+            term = owner_clean
         if not term:
+            return _helper_error(start_calls=start_calls, source="/api/collections", error="query or owner is required")
+        params: dict[str, Any] = {"limit": fetch_lim}
+        if term:
+            params["q"] = term
+        if owner_clean:
+            params["owner"] = owner_clean
+        resp = _host_raw_call("/api/collections", params=params)
         if not resp.get("ok"):
             return _helper_error(
                 start_calls=start_calls,
             owner = _author_from_any(row.get("owner")) or _author_from_any(row.get("ownerData"))
             if not owner and isinstance(row.get("slug"), str) and "/" in str(row.get("slug")):
                 owner = str(row.get("slug")).split("/", 1)[0]
+            if owner_clean is not None and owner != owner_clean:
+                continue
+            owner_payload = row.get("owner") if isinstance(row.get("owner"), dict) else {}
+            collection_items = row.get("items") if isinstance(row.get("items"), list) else []
+            slug = row.get("slug")
+            items.append(
+                {
+                    "collection_id": slug,
+                    "slug": slug,
+                    "title": row.get("title"),
+                    "owner": owner,
+                    "owner_type": owner_payload.get("type") if isinstance(owner_payload.get("type"), str) else None,
+                    "description": row.get("description"),
+                    "gating": row.get("gating"),
+                    "last_updated": row.get("lastUpdated"),
+                    "item_count": len(collection_items),
+                }
+            )
         items = _apply_where(items, where)
         total_matched = len(items)
         items = items[:lim]
+        items = _project_collection_items(items, fields)
         truncated = (lim > 0 and total_matched > lim) or (lim == 0 and len(payload) >= fetch_lim)
         return _helper_success(
             truncated=truncated,
             complete=not truncated,
             query=term,
+            owner=owner_clean,
         )
     m = pydantic_monty.Monty(