Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

evalstate HF Staff commited on Mar 9

Commit

c072a80

verified ·

1 Parent(s): 536880e

Deploy latest python changes

Browse files

Files changed (2) hide show

_monty_codegen_shared.md +12 -0
monty_api_tool_v2.py +116 -6

_monty_codegen_shared.md CHANGED Viewed

@@ -171,6 +171,12 @@ await hf_repo_discussions(
   limit: int = 20,
 )
 await hf_whoami()
 await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)
 ```
@@ -207,6 +213,8 @@ Search/detail/trending repo rows commonly include:
 - `created_at`
 - `last_modified`
 - `pipeline_tag`
 - `private`
 - `repo_url`
 - `tags`
@@ -253,6 +261,7 @@ Choose the helper based on the **subject of the question** and the **smallest he
 - Search/discovery/list/top repos → `hf_repo_search(...)`
 - True trending requests → `hf_trending(...)`
 - Repo discussions → `hf_repo_discussions(...)`
 - Users who liked a specific repo / liker filtering / liker counts → `hf_repo_likers(...)`
 ### User-centric questions
@@ -291,6 +300,8 @@ Pick the helper that already matches the direction of the question instead of tr
 - `hf_repo_search(...)` defaults to `repo_type="model"` when no repo type is specified. For prompts like "what repos does <author/org> have" or "list everything published by <author/org>", search across `repo_types=["model", "dataset", "space"]` unless the user explicitly asked for one type.
 - Use `hf_repo_details(repo_type="auto", ...)` for `owner/name` detail lookups unless the type is explicit.
 - Use `hf_trending(...)` only for true trending requests.
 - `hf_trending(...)` does not accept extra filters like tag/author/task. For trending + extra filters, either ask a brief clarification or clearly label an approximation using `hf_repo_search(sort="trending_score", ...)`.
 - Use `hf_user_summary(...)` for common "tell me about user X" prompts. It returns a fixed structured object (no `fields=` projection) with overview data and optional sampled followers/following/likes/activity sections. Read profile and social-link fields such as `websiteUrl`, `twitter`, `github`, `linkedin`, and `bluesky` from `summary["item"]["overview"]`.
 - For "my/me" prompts, prefer current-user forms first: `hf_user_summary(username=None)`, `hf_user_graph(username=None, ...)`, and `hf_user_likes(username=None, ...)`. Use `hf_whoami()` when you need the resolved username explicitly.
@@ -314,6 +325,7 @@ Pick the helper that already matches the direction of the question instead of tr
 - For user Spaces, use `hf_repo_search(author=..., repo_type="space", ...)`. Do not look for a special spaces-by-author helper.
 - Organizations are valid `author=` values for `hf_repo_search(...)`. To inventory an organization's repos, use `author="<org>"` with `repo_types=["model", "dataset", "space"]` and then project to the requested fields.
 - Use `hf_repo_discussions(...)` for model/dataset/space discussion listings. Do not guess raw discussion endpoints through `call_api`.
 - For ambiguous discovery, either ask a brief clarification or search across `repo_types=["model", "dataset", "space"]`.
 - For Spaces, `filters` are broader Hub tag-style filters rather than a standardized task taxonomy like model `pipeline_tag`.
 - For semantic Space queries (for example image-generation, audio, chat), prefer a broad search with rich fields and then narrow locally.

   limit: int = 20,
 )
+await hf_repo_discussion_details(
+  repo_type: str,
+  repo_id: str,                       # owner/name
+  discussion_num: int,
+)
 await hf_whoami()
 await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)
 ```
 - `created_at`
 - `last_modified`
 - `pipeline_tag`
+- `trending_score`
+- `trending_rank`
 - `private`
 - `repo_url`
 - `tags`
 - Search/discovery/list/top repos → `hf_repo_search(...)`
 - True trending requests → `hf_trending(...)`
 - Repo discussions → `hf_repo_discussions(...)`
+- Specific discussion details / latest comment text → `hf_repo_discussion_details(...)`
 - Users who liked a specific repo / liker filtering / liker counts → `hf_repo_likers(...)`
 ### User-centric questions
 - `hf_repo_search(...)` defaults to `repo_type="model"` when no repo type is specified. For prompts like "what repos does <author/org> have" or "list everything published by <author/org>", search across `repo_types=["model", "dataset", "space"]` unless the user explicitly asked for one type.
 - Use `hf_repo_details(repo_type="auto", ...)` for `owner/name` detail lookups unless the type is explicit.
 - Use `hf_trending(...)` only for true trending requests.
+- `hf_trending(...)` returns the Hub's ordered trending list. Numeric `trending_score` may be unavailable from the upstream API; when that field is missing, use `trending_rank` / ordering instead of inventing a score.
+- If the user explicitly asks for numeric trending scores and the returned rows have `trending_score=None`, say the numeric scores are unavailable from the upstream API and return the ordered repos with `trending_rank` instead of emitting null score values.
 - `hf_trending(...)` does not accept extra filters like tag/author/task. For trending + extra filters, either ask a brief clarification or clearly label an approximation using `hf_repo_search(sort="trending_score", ...)`.
 - Use `hf_user_summary(...)` for common "tell me about user X" prompts. It returns a fixed structured object (no `fields=` projection) with overview data and optional sampled followers/following/likes/activity sections. Read profile and social-link fields such as `websiteUrl`, `twitter`, `github`, `linkedin`, and `bluesky` from `summary["item"]["overview"]`.
 - For "my/me" prompts, prefer current-user forms first: `hf_user_summary(username=None)`, `hf_user_graph(username=None, ...)`, and `hf_user_likes(username=None, ...)`. Use `hf_whoami()` when you need the resolved username explicitly.
 - For user Spaces, use `hf_repo_search(author=..., repo_type="space", ...)`. Do not look for a special spaces-by-author helper.
 - Organizations are valid `author=` values for `hf_repo_search(...)`. To inventory an organization's repos, use `author="<org>"` with `repo_types=["model", "dataset", "space"]` and then project to the requested fields.
 - Use `hf_repo_discussions(...)` for model/dataset/space discussion listings. Do not guess raw discussion endpoints through `call_api`.
+- Use `hf_repo_discussion_details(...)` when the user asks for the latest comment text, discussion body, or details for a known discussion number.
 - For ambiguous discovery, either ask a brief clarification or search across `repo_types=["model", "dataset", "space"]`.
 - For Spaces, `filters` are broader Hub tag-style filters rather than a standardized task taxonomy like model `pipeline_tag`.
 - For semantic Space queries (for example image-generation, audio, chat), prefer a broad search with rich fields and then narrow locally.

monty_api_tool_v2.py CHANGED Viewed

@@ -256,6 +256,7 @@ HELPER_EXTERNALS = [
     "hf_user_likes",
     "hf_recent_activity",
     "hf_repo_discussions",
     "hf_repo_details",
     "hf_trending",
     "hf_collections_search",
@@ -316,7 +317,7 @@ class MontyExecutionError(RuntimeError):
         self.trace = trace
-def _load_token() -> str | None:
     try:
         from fast_agent.mcp.auth.context import request_bearer_token  # type: ignore
@@ -325,6 +326,13 @@ def _load_token() -> str | None:
             return token
     except Exception:
         pass
     return os.getenv("HF_TOKEN") or None
@@ -644,13 +652,13 @@ def _normalize_repo_detail_row(detail: Any, repo_type: str, repo_id: str) -> dic
     return row
-def _normalize_trending_row(repo: dict[str, Any], default_repo_type: str) -> dict[str, Any]:
     repo_id = repo.get("id")
     repo_type = _canonical_repo_type(repo.get("type") or default_repo_type, default=default_repo_type)
     author = repo.get("author")
     if not isinstance(author, str) and isinstance(repo_id, str) and "/" in repo_id:
         author = repo_id.split("/", 1)[0]
-    return {
         "repo_id": repo_id,
         "repo_type": repo_type,
         "author": author,
@@ -673,6 +681,9 @@ def _normalize_trending_row(repo: dict[str, Any], default_repo_type: str) -> dic
         "datasets": _optional_str_list(repo.get("datasets")),
         "subdomain": repo.get("subdomain"),
     }
 def _sort_repo_rows(rows: list[dict[str, Any]], sort_key: str | None) -> list[dict[str, Any]]:
@@ -1297,8 +1308,21 @@ async def _run_with_monty(
     async def hf_whoami() -> dict[str, Any]:
         start_calls = call_count["n"]
         endpoint = "/api/whoami-v2"
         try:
-            payload = _host_hf_call(endpoint, lambda: _get_hf_api_client().whoami(cache=True))
         except Exception as e:
             return _helper_error(start_calls=start_calls, source=endpoint, error=e)
@@ -2890,6 +2914,87 @@ async def _run_with_monty(
             total_count=None,
         )
     def _resolve_repo_detail_row(
         api: HfApi,
         repo_id: str,
@@ -3032,11 +3137,11 @@ async def _run_with_monty(
         items: list[dict[str, Any]] = []
         default_row_type = requested_type if requested_type != "all" else "model"
-        for row in rows[:lim]:
             if not isinstance(row, dict):
                 continue
             repo = row.get("repoData") if isinstance(row.get("repoData"), dict) else {}
-            items.append(_normalize_trending_row(repo, default_row_type))
         api = _get_hf_api_client()
         enriched_items: list[dict[str, Any]] = []
@@ -3064,6 +3169,8 @@ async def _run_with_monty(
             trending_score = item.get("trending_score")
             if trending_score is not None:
                 merged["trending_score"] = trending_score
             enriched_items.append(merged)
         items = enriched_items
@@ -3080,6 +3187,8 @@ async def _run_with_monty(
             scanned=len(rows),
             matched=matched,
             returned=len(items),
             failures=enrichment_failures or None,
         )
@@ -3188,6 +3297,7 @@ async def _run_with_monty(
                 "hf_user_likes": _collecting_wrapper("hf_user_likes", hf_user_likes),
                 "hf_recent_activity": _collecting_wrapper("hf_recent_activity", hf_recent_activity),
                 "hf_repo_discussions": _collecting_wrapper("hf_repo_discussions", hf_repo_discussions),
                 "hf_repo_details": _collecting_wrapper("hf_repo_details", hf_repo_details),
                 "hf_trending": _collecting_wrapper("hf_trending", hf_trending),
                 "hf_collections_search": _collecting_wrapper("hf_collections_search", hf_collections_search),

     "hf_user_likes",
     "hf_recent_activity",
     "hf_repo_discussions",
+    "hf_repo_discussion_details",
     "hf_repo_details",
     "hf_trending",
     "hf_collections_search",
         self.trace = trace
+def _load_request_token() -> str | None:
     try:
         from fast_agent.mcp.auth.context import request_bearer_token  # type: ignore
             return token
     except Exception:
         pass
+    return None
+def _load_token() -> str | None:
+    token = _load_request_token()
+    if token:
+        return token
     return os.getenv("HF_TOKEN") or None
     return row
+def _normalize_trending_row(repo: dict[str, Any], default_repo_type: str, rank: int | None = None) -> dict[str, Any]:
     repo_id = repo.get("id")
     repo_type = _canonical_repo_type(repo.get("type") or default_repo_type, default=default_repo_type)
     author = repo.get("author")
     if not isinstance(author, str) and isinstance(repo_id, str) and "/" in repo_id:
         author = repo_id.split("/", 1)[0]
+    row = {
         "repo_id": repo_id,
         "repo_type": repo_type,
         "author": author,
         "datasets": _optional_str_list(repo.get("datasets")),
         "subdomain": repo.get("subdomain"),
     }
+    if rank is not None:
+        row["trending_rank"] = rank
+    return row
 def _sort_repo_rows(rows: list[dict[str, Any]], sort_key: str | None) -> list[dict[str, Any]]:
     async def hf_whoami() -> dict[str, Any]:
         start_calls = call_count["n"]
         endpoint = "/api/whoami-v2"
+        request_token = _load_request_token()
+        if request_token is None:
+            return _helper_error(
+                start_calls=start_calls,
+                source=endpoint,
+                error=(
+                    "Current authenticated user is unavailable for this request. "
+                    "No request bearer token was found."
+                ),
+            )
         try:
+            payload = _host_hf_call(
+                endpoint,
+                lambda: _get_hf_api_client().whoami(token=request_token, cache=True),
+            )
         except Exception as e:
             return _helper_error(start_calls=start_calls, source=endpoint, error=e)
             total_count=None,
         )
+    async def hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int) -> dict[str, Any]:
+        start_calls = call_count["n"]
+        rt = _canonical_repo_type(repo_type)
+        rid = str(repo_id or "").strip()
+        if "/" not in rid:
+            return _helper_error(start_calls=start_calls, source="/api/.../discussions/<num>", error="repo_id must be owner/name")
+        num = _as_int(discussion_num)
+        if num is None:
+            return _helper_error(
+                start_calls=start_calls,
+                source=f"/api/{rt}s/{rid}/discussions/<num>",
+                error="discussion_num must be an integer",
+            )
+        endpoint = f"/api/{rt}s/{rid}/discussions/{num}"
+        try:
+            detail = _host_hf_call(
+                endpoint,
+                lambda: _get_hf_api_client().get_discussion_details(
+                    repo_id=rid,
+                    discussion_num=int(num),
+                    repo_type=rt,
+                ),
+            )
+        except Exception as e:
+            return _helper_error(start_calls=start_calls, source=endpoint, error=e)
+        comment_events: list[dict[str, Any]] = []
+        raw_events = getattr(detail, "events", None)
+        if isinstance(raw_events, list):
+            for event in raw_events:
+                if str(getattr(event, "type", "")).strip().lower() != "comment":
+                    continue
+                comment_events.append(
+                    {
+                        "author": getattr(event, "author", None),
+                        "createdAt": _dt_to_str(getattr(event, "created_at", None)),
+                        "text": getattr(event, "content", None),
+                        "rendered": getattr(event, "rendered", None),
+                    }
+                )
+        latest_comment: dict[str, Any] | None = None
+        if comment_events:
+            latest_comment = max(comment_events, key=lambda row: str(row.get("createdAt") or ""))
+        item: dict[str, Any] = {
+            "num": num,
+            "number": num,
+            "discussionNum": num,
+            "id": num,
+            "repo_id": rid,
+            "repo_type": rt,
+            "title": getattr(detail, "title", None),
+            "author": getattr(detail, "author", None),
+            "createdAt": _dt_to_str(getattr(detail, "created_at", None)),
+            "status": getattr(detail, "status", None),
+            "url": getattr(detail, "url", None),
+            "commentCount": len(comment_events),
+            "latestCommentAuthor": latest_comment.get("author") if latest_comment else None,
+            "latestCommentCreatedAt": latest_comment.get("createdAt") if latest_comment else None,
+            "latestCommentText": latest_comment.get("text") if latest_comment else None,
+            "latestCommentHtml": latest_comment.get("rendered") if latest_comment else None,
+            "latest_comment_author": latest_comment.get("author") if latest_comment else None,
+            "latest_comment_created_at": latest_comment.get("createdAt") if latest_comment else None,
+            "latest_comment_text": latest_comment.get("text") if latest_comment else None,
+            "latest_comment_html": latest_comment.get("rendered") if latest_comment else None,
+        }
+        return _helper_success(
+            start_calls=start_calls,
+            source=endpoint,
+            items=[item],
+            scanned=len(comment_events),
+            matched=1,
+            returned=1,
+            truncated=False,
+            total_comments=len(comment_events),
+        )
     def _resolve_repo_detail_row(
         api: HfApi,
         repo_id: str,
         items: list[dict[str, Any]] = []
         default_row_type = requested_type if requested_type != "all" else "model"
+        for idx, row in enumerate(rows[:lim], start=1):
             if not isinstance(row, dict):
                 continue
             repo = row.get("repoData") if isinstance(row.get("repoData"), dict) else {}
+            items.append(_normalize_trending_row(repo, default_row_type, rank=idx))
         api = _get_hf_api_client()
         enriched_items: list[dict[str, Any]] = []
             trending_score = item.get("trending_score")
             if trending_score is not None:
                 merged["trending_score"] = trending_score
+            if item.get("trending_rank") is not None:
+                merged["trending_rank"] = item.get("trending_rank")
             enriched_items.append(merged)
         items = enriched_items
             scanned=len(rows),
             matched=matched,
             returned=len(items),
+            trending_score_available=any(item.get("trending_score") is not None for item in items),
+            ordered_ranking=True,
             failures=enrichment_failures or None,
         )
                 "hf_user_likes": _collecting_wrapper("hf_user_likes", hf_user_likes),
                 "hf_recent_activity": _collecting_wrapper("hf_recent_activity", hf_recent_activity),
                 "hf_repo_discussions": _collecting_wrapper("hf_repo_discussions", hf_repo_discussions),
+                "hf_repo_discussion_details": _collecting_wrapper("hf_repo_discussion_details", hf_repo_discussion_details),
                 "hf_repo_details": _collecting_wrapper("hf_repo_details", hf_repo_details),
                 "hf_trending": _collecting_wrapper("hf_trending", hf_trending),
                 "hf_collections_search": _collecting_wrapper("hf_collections_search", hf_collections_search),