evalstate HF Staff commited on
Commit
cb4d0c0
·
verified ·
1 Parent(s): 70eec24

Deploy latest local hf-hub-query version

Browse files
Files changed (3) hide show
  1. _monty_codegen_shared.md +204 -6
  2. hf-hub-query.md +3 -3
  3. monty_api_tool_v2.py +421 -289
_monty_codegen_shared.md CHANGED
@@ -2,6 +2,7 @@
2
  - No imports.
3
  - Helper functions are already in scope.
4
  - All helper calls are async: always use `await`.
 
5
  - Before sending the tool call, check that the wrapper both defines `solve(...)` and ends with `await solve(query, max_calls)`.
6
  - Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
7
  - `call_api` must use a raw path starting with `/api/...`.
@@ -11,11 +12,13 @@
11
  - When the user asks for specific fields or "return only ...", return exactly that final shape from `solve(...)` instead of a larger helper envelope.
12
  - For bounded list/sample helpers in raw mode, prefer returning the helper envelope directly when coverage/limit metadata matters.
13
  - For detail lookups, prefer returning a compact dict of relevant fields rather than the full raw helper response.
14
- - Prefer omitting unavailable fields rather than emitting `null` placeholders, unless the user explicitly asked for a fixed schema with nulls.
15
- - For structured requests asking for counts/lists/fields, prefer returning a compact JSON object/array instead of prose or markdown tables, even if the user did not explicitly say "return only".
16
- - If the user names output fields explicitly (for example `id, title, likes` or `event_type + repo_id`), return those exact field names in JSON rather than paraphrasing them into prose labels.
17
- - For prompts that say "when present", include the field only when it has a real value; do not emit `null` placeholders.
18
- - For prompts asking for compact structured output, use stable key names from the examples below instead of inventing new labels.
 
 
19
 
20
  ## Helper result shape
21
  All helpers return:
@@ -35,6 +38,38 @@ Rules:
35
  - `meta` contains helper-owned execution and coverage metadata. For bounded list/sample helpers this can include requested/applied limits, whether a default limit was used, exactness/completeness, whether more rows may be available, truncation cause, and a next-request hint.
36
  - Helpers return rich default rows. Use `fields` to narrow output; use `advanced` only when you truly need backend-specific behavior beyond the default row.
37
  - Exhaustive helpers such as graph/members/likes/activity can return substantially more than 100 rows when you request a larger `return_limit`; use helper `meta` (and the outer raw `meta.limit_summary`) to tell when limits were still hit.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Helper API
40
  ```py
@@ -95,6 +130,16 @@ await hf_user_graph(
95
  fields: list[str] | None = None,
96
  )
97
 
 
 
 
 
 
 
 
 
 
 
98
  await hf_user_likes(
99
  username: str | None = None, # None => current authenticated user
100
  repo_types: list[str] | None = None,
@@ -130,6 +175,28 @@ await hf_whoami()
130
  await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)
131
  ```
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  ## Common repo fields
134
  Search/detail/trending repo rows commonly include:
135
  - `repo_id`
@@ -151,21 +218,96 @@ Type-specific fields may also be present by default when available, such as:
151
  - dataset: `description`, `paperswithcode_id`
152
  - space: `sdk`, `models`, `datasets`, `subdomain`
153
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  ## Usage guidance
155
  - Use `hf_repo_search(...)` for find/search/top requests. Prefer dedicated args like `author=` over using `where` when a first-class helper argument exists.
156
  - `hf_repo_search(...)` defaults to `repo_type="model"` when no repo type is specified. For prompts like "what repos does <author/org> have" or "list everything published by <author/org>", search across `repo_types=["model", "dataset", "space"]` unless the user explicitly asked for one type.
157
  - Use `hf_repo_details(repo_type="auto", ...)` for `owner/name` detail lookups unless the type is explicit.
158
  - Use `hf_trending(...)` only for true trending requests.
159
  - `hf_trending(...)` does not accept extra filters like tag/author/task. For trending + extra filters, either ask a brief clarification or clearly label an approximation using `hf_repo_search(sort="trending_score", ...)`.
160
- - Use `hf_user_summary(...)` for common "tell me about user X" prompts. It always includes overview data and can add sampled followers/following/likes/activity sections.
 
161
  - Use `hf_org_overview(...)` for organization details like display name, followers, and member count.
162
  - Use `hf_org_members(...)` for organization member lists and counts. Member rows use `username`, `fullname`, `isPro`, and `role`; common aliases like `login`, `name`, and `is_pro` are tolerated in `fields=[...]`.
163
  - Use `hf_user_graph(...)` for follower/following lists, counts, and filtered graph samples. Prefer `relation=` over trying undocumented helper names.
 
164
  - For overlap/comparison/ranking tasks over followers, org members, likes, or activity, do not use small manual `return_limit` values like 10/20/50 unless the user explicitly asked for a sample. Use the helper default or a clearly high bound for the intermediate analysis, then keep only the final displayed result compact.
 
165
  - Use `hf_user_likes(...)` for liked-repo prompts. Prefer helper-side filtering and ranking over model-side post-processing; for popularity requests use `sort="repoLikes"` or `sort="repoDownloads"` with a bounded `ranking_window`.
166
  - For prompts like "most popular repository a user liked recently", call `hf_user_likes(username=..., sort="repoLikes", ranking_window=40, return_limit=1)` directly. Do not fetch default recent likes and manually re-rank them.
167
  - `hf_user_likes(...)` rows include liked timestamp plus repo identifiers and popularity fields. Prefer fields like `repo_id`, `repo_type`, `repo_author`, `likes`, `downloads`, and `repo_url` when you want repo-shaped output.
168
  - `hf_user_graph(...)` rows use `username`, `fullname`, and `isPro`. Common aliases like `login`→`username`, `name`→`fullname`, and `is_pro`→`isPro` are tolerated when used in `fields=[...]`, but prefer the canonical names in generated code.
 
 
 
 
169
  - `hf_user_graph(...)` also accepts organization names for `relation="followers"`. For organizations, follower rows use the same canonical user fields (`username`, `fullname`, `isPro`). Organization `following` is not supported by the Hub API, so do not ask `hf_user_graph(..., relation="following")` for an organization.
170
  - Use `hf_recent_activity(...)` for activity-feed prompts. Prefer `feed_type` + `entity` rather than raw `call_api("/api/recent-activity", ...)`.
171
  - `hf_recent_activity(...)` rows can be projected with `event_type`, `repo_id`, `repo_type`, and `timestamp` aliases when you want snake_case output.
@@ -226,6 +368,27 @@ return {
226
  "latest_likes": item["likes"]["sample"],
227
  }
228
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  # Popularity-ranked likes: helper-side shortlist enrichment + ranking
230
  likes = await hf_user_likes(
231
  username="julien-c",
@@ -250,6 +413,41 @@ return {
250
  },
251
  }
252
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
  # Recent activity with snake_case aliases
254
  activity = await hf_recent_activity(
255
  feed_type="user",
 
2
  - No imports.
3
  - Helper functions are already in scope.
4
  - All helper calls are async: always use `await`.
5
+ - `max_calls` is the overall external-call budget for the generated program, not a generic helper argument. You may use it inside `solve(...)` to bound loops or choose a cheaper fallback strategy, but do not pass it to helpers unless the helper signature explicitly includes `max_calls`.
6
  - Before sending the tool call, check that the wrapper both defines `solve(...)` and ends with `await solve(query, max_calls)`.
7
  - Use helper functions first. Use raw `call_api('/api/...')` only if no helper fits.
8
  - `call_api` must use a raw path starting with `/api/...`.
 
12
  - When the user asks for specific fields or "return only ...", return exactly that final shape from `solve(...)` instead of a larger helper envelope.
13
  - For bounded list/sample helpers in raw mode, prefer returning the helper envelope directly when coverage/limit metadata matters.
14
  - For detail lookups, prefer returning a compact dict of relevant fields rather than the full raw helper response.
15
+ - For structured requests, prefer compact JSON objects/arrays over prose or markdown tables. Use the user's requested field names when they are explicit, use the stable key names shown below when they are not, and omit unavailable fields unless the user explicitly asked for a fixed schema with nulls.
16
+ - For prompts that ask for both a sample/list and metadata, keep the sample compact and surface helper-owned metadata explicitly. Do not dump a very large item list just because metadata was requested.
17
+ - If the user asks for coverage/exactness/truncation metadata, prefer helper `meta` fields such as `exact_count`, `sample_complete`, `truncated`, `count_source`, `returned`, `total`, `total_matched`, `more_available`, and applied/requested limits.
18
+ - When the user asks for a sample plus metadata and does not specify a large sample size, default to a small sample (typically 10-20 rows) rather than returning hundreds of rows.
19
+ - Do not ask the user for their username just because they said "my" or "me". Use current-user helper behavior first.
20
+ - For current-user prompts, prefer helpers that support `username=None` for the authenticated user. Call `hf_whoami()` first when you need the explicit username for joins, comparisons, or output labeling.
21
+ - Only ask a follow-up for identity if `hf_whoami()` or a current-user helper fails because authentication/current-user resolution is unavailable.
22
 
23
  ## Helper result shape
24
  All helpers return:
 
38
  - `meta` contains helper-owned execution and coverage metadata. For bounded list/sample helpers this can include requested/applied limits, whether a default limit was used, exactness/completeness, whether more rows may be available, truncation cause, and a next-request hint.
39
  - Helpers return rich default rows. Use `fields` to narrow output; use `advanced` only when you truly need backend-specific behavior beyond the default row.
40
  - Exhaustive helpers such as graph/members/likes/activity can return substantially more than 100 rows when you request a larger `return_limit`; use helper `meta` (and the outer raw `meta.limit_summary`) to tell when limits were still hit.
41
+ - For metadata-oriented prompts, read and return the helper `meta` object (or a compact subset of it) instead of inferring coverage from list length alone.
42
+
43
+ ## Typed graph shorthand
44
+ Use this graph mental model instead of reconstructing relations from raw endpoints.
45
+
46
+ ```text
47
+ Node types
48
+ - U = user row {username, fullname, isPro, ...}
49
+ - O = org row {organization, displayName, followers, members, ...}
50
+ - R = repo row {repo_id, repo_type, author, likes, downloads, repo_url, ...}
51
+ - A = activity row {event_type, repo_id, repo_type, timestamp, ...}
52
+ - S = user summary {username, overview, followers?, following?, likes?, activity?}
53
+
54
+ Direct edges / helpers
55
+ - U -followers-> U => hf_user_graph(relation="followers")
56
+ - U -following-> U => hf_user_graph(relation="following")
57
+ - U -likes-> R => hf_user_likes(username=...)
58
+ - R -liked_by-> U => hf_repo_likers(repo_id=..., repo_type=...)
59
+ - O -members-> U => hf_org_members(organization=...)
60
+ - O -repos-> R => hf_repo_search(author="<org>", repo_types=[...])
61
+ - U/O -activity-> A => hf_recent_activity(feed_type=..., entity=...)
62
+ - R -details-> R => hf_repo_details(...)
63
+ - U -summary-> S => hf_user_summary(...)
64
+ - O -overview-> O => hf_org_overview(...)
65
+ ```
66
+
67
+ Rules:
68
+ - Prefer the helper that already matches the requested edge direction.
69
+ - Do not reverse a relation indirectly if a direct helper exists.
70
+ - If you already know an author/org and need repos, go straight to `hf_repo_search(author=...)`.
71
+ - Read profile/social/link fields from `hf_user_summary(... )["item"]["overview"]`, not from graph rows.
72
+ - Use canonical row fields in generated code: user rows use `username`, repo rows use `repo_id`/`repo_type`, activity rows use `event_type`/`repo_id`/`repo_type`/`timestamp`.
73
 
74
  ## Helper API
75
  ```py
 
130
  fields: list[str] | None = None,
131
  )
132
 
133
+ await hf_repo_likers(
134
+ repo_id: str,
135
+ repo_type: str, # model|dataset|space
136
+ return_limit: int | None = None,
137
+ count_only: bool = False,
138
+ pro_only: bool | None = None,
139
+ where: dict | None = None,
140
+ fields: list[str] | None = None,
141
+ )
142
+
143
  await hf_user_likes(
144
  username: str | None = None, # None => current authenticated user
145
  repo_types: list[str] | None = None,
 
175
  await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)
176
  ```
177
 
178
+ ## Important nested result contracts
179
+ ### `hf_user_summary(...)`
180
+ `hf_user_summary(...)` returns the normal helper envelope. The main payload is in `summary["item"]`.
181
+
182
+ ```py
183
+ summary["item"] == {
184
+ "username": str,
185
+ "overview": dict, # profile + socials + counts
186
+ "followers": {"count": int | None, "sample": list[dict]} | None,
187
+ "following": {"count": int | None, "sample": list[dict]} | None,
188
+ "likes": {"count": int | None, "sample": list[dict]} | None,
189
+ "activity": {"count": int | None, "sample": list[dict]} | None,
190
+ }
191
+ ```
192
+
193
+ Read profile/social fields from `summary["item"]["overview"]`, commonly:
194
+ - `websiteUrl`
195
+ - `twitter`, `github`, `linkedin`, `bluesky`
196
+ - `twitterHandle`, `githubHandle`, `linkedinHandle`, `blueskyHandle`
197
+ - `followers`, `following`, `likes`
198
+ - `isPro`
199
+
200
  ## Common repo fields
201
  Search/detail/trending repo rows commonly include:
202
  - `repo_id`
 
218
  - dataset: `description`, `paperswithcode_id`
219
  - space: `sdk`, `models`, `datasets`, `subdomain`
220
 
221
+ ## Common user overview fields
222
+ `hf_user_summary(... )["item"]["overview"]` commonly includes:
223
+ - `username`
224
+ - `fullname`
225
+ - `bio`
226
+ - `avatarUrl`
227
+ - `websiteUrl`
228
+ - `twitter`
229
+ - `github`
230
+ - `linkedin`
231
+ - `bluesky`
232
+ - `twitterHandle`
233
+ - `githubHandle`
234
+ - `linkedinHandle`
235
+ - `blueskyHandle`
236
+ - `followers`
237
+ - `following`
238
+ - `likes`
239
+ - `models`
240
+ - `datasets`
241
+ - `spaces`
242
+ - `discussions`
243
+ - `papers`
244
+ - `upvotes`
245
+ - `orgs`
246
+ - `isPro`
247
+
248
+ ## Primary navigation paths
249
+ Choose the helper based on the **subject of the question** and the **smallest helper that already contains the needed fields**.
250
+
251
+ ### Repo-centric questions
252
+ - Exact repo by id (`owner/name`) → `hf_repo_details(...)`
253
+ - Search/discovery/list/top repos → `hf_repo_search(...)`
254
+ - True trending requests → `hf_trending(...)`
255
+ - Repo discussions → `hf_repo_discussions(...)`
256
+ - Users who liked a specific repo / liker filtering / liker counts → `hf_repo_likers(...)`
257
+
258
+ ### User-centric questions
259
+ - Profile / overview / "tell me about user X" → `hf_user_summary(...)`
260
+ - Followers / following / graph sampling → `hf_user_graph(...)`
261
+ - Repos a user liked → `hf_user_likes(...)`
262
+ - Recent actions / feed questions → `hf_recent_activity(...)`
263
+
264
+ ### Organization-centric questions
265
+ - Organization details / counts → `hf_org_overview(...)`
266
+ - Organization members → `hf_org_members(...)`
267
+ - Organization repos → `hf_repo_search(author="<org>", repo_types=["model", "dataset", "space"])`
268
+
269
+ ### Relationship direction matters
270
+ - `hf_user_likes(...)` = **user → repos**
271
+ - `hf_repo_likers(...)` = **repo → users**
272
+ - `hf_user_graph(...)` = **user/org → followers/following**
273
+
274
+ Pick the helper that already matches the direction of the question instead of trying to reconstruct the relation indirectly.
275
+
276
+ ## Efficiency rules
277
+ - Prefer **one helper call with local aggregation/filtering** over **many follow-up detail calls**.
278
+ - If the current helper already exposes the needed field, use it directly instead of hydrating each row with another helper.
279
+ - Use `hf_user_summary(username=...)` per user only when you truly need profile/social fields that are not already present in the current row set.
280
+ - For overlap/comparison/ranking tasks, fetch a broad enough working set first, then compute locally in generated code.
281
+ - Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
282
+
283
+ ### Important anti-patterns
284
+ - For repo liker questions, do **not** call `hf_user_summary(...)` for each liker just to determine `isPro` or `type`. `hf_repo_likers(...)` already returns `username`, `fullname`, `type`, and `isPro`.
285
+ - For follower/following questions, do **not** use `hf_user_summary(...)` to reconstruct the graph. Use `hf_user_graph(...)`.
286
+ - For "repos by author/org" questions, do **not** search semantically first if the author is already known. Start with `hf_repo_search(author=..., ...)`.
287
+ - For "most popular repo a user liked" questions, do **not** fetch recent likes and manually re-rank them. Use `hf_user_likes(..., sort="repoLikes" | "repoDownloads")`.
288
+
289
  ## Usage guidance
290
  - Use `hf_repo_search(...)` for find/search/top requests. Prefer dedicated args like `author=` over using `where` when a first-class helper argument exists.
291
  - `hf_repo_search(...)` defaults to `repo_type="model"` when no repo type is specified. For prompts like "what repos does <author/org> have" or "list everything published by <author/org>", search across `repo_types=["model", "dataset", "space"]` unless the user explicitly asked for one type.
292
  - Use `hf_repo_details(repo_type="auto", ...)` for `owner/name` detail lookups unless the type is explicit.
293
  - Use `hf_trending(...)` only for true trending requests.
294
  - `hf_trending(...)` does not accept extra filters like tag/author/task. For trending + extra filters, either ask a brief clarification or clearly label an approximation using `hf_repo_search(sort="trending_score", ...)`.
295
+ - Use `hf_user_summary(...)` for common "tell me about user X" prompts. It returns a fixed structured object (no `fields=` projection) with overview data and optional sampled followers/following/likes/activity sections. Read profile and social-link fields such as `websiteUrl`, `twitter`, `github`, `linkedin`, and `bluesky` from `summary["item"]["overview"]`.
296
+ - For "my/me" prompts, prefer current-user forms first: `hf_user_summary(username=None)`, `hf_user_graph(username=None, ...)`, and `hf_user_likes(username=None, ...)`. Use `hf_whoami()` when you need the resolved username explicitly.
297
  - Use `hf_org_overview(...)` for organization details like display name, followers, and member count.
298
  - Use `hf_org_members(...)` for organization member lists and counts. Member rows use `username`, `fullname`, `isPro`, and `role`; common aliases like `login`, `name`, and `is_pro` are tolerated in `fields=[...]`.
299
  - Use `hf_user_graph(...)` for follower/following lists, counts, and filtered graph samples. Prefer `relation=` over trying undocumented helper names.
300
+ - Use `hf_repo_likers(...)` for "who liked this repo?" prompts. It returns liker rows for a specific model, dataset, or space; pass `repo_type` explicitly.
301
  - For overlap/comparison/ranking tasks over followers, org members, likes, or activity, do not use small manual `return_limit` values like 10/20/50 unless the user explicitly asked for a sample. Use the helper default or a clearly high bound for the intermediate analysis, then keep only the final displayed result compact.
302
+ - For follower/member social-link lookups, first fetch usernames with `hf_user_graph(...)` or `hf_org_members(...)`, then fetch each user's profile/social data with `hf_user_summary(username=...)`. If the follower/member set is large and the user did not specify a cap, ask for a limit or clearly indicate that the result may be partial because each user requires additional calls.
303
  - Use `hf_user_likes(...)` for liked-repo prompts. Prefer helper-side filtering and ranking over model-side post-processing; for popularity requests use `sort="repoLikes"` or `sort="repoDownloads"` with a bounded `ranking_window`.
304
  - For prompts like "most popular repository a user liked recently", call `hf_user_likes(username=..., sort="repoLikes", ranking_window=40, return_limit=1)` directly. Do not fetch default recent likes and manually re-rank them.
305
  - `hf_user_likes(...)` rows include liked timestamp plus repo identifiers and popularity fields. Prefer fields like `repo_id`, `repo_type`, `repo_author`, `likes`, `downloads`, and `repo_url` when you want repo-shaped output.
306
  - `hf_user_graph(...)` rows use `username`, `fullname`, and `isPro`. Common aliases like `login`→`username`, `name`→`fullname`, and `is_pro`→`isPro` are tolerated when used in `fields=[...]`, but prefer the canonical names in generated code.
307
+ - `hf_repo_likers(...)` rows use `username`, `fullname`, `type`, and `isPro`. Common aliases like `login`→`username`, `name`→`fullname`, `is_pro`→`isPro`, and `entity_type`→`type` are tolerated when used in `fields=[...]`, but prefer the canonical names in generated code.
308
+ - `hf_repo_likers(...)` is a one-shot liker list helper, not a cursor feed. Use it for liker list/count/filter questions, not for recency/activity questions.
309
+ - For liker count/breakdown questions, prefer `hf_repo_likers(..., count_only=True, where=...)` or a single broad `hf_repo_likers(...)` call with local aggregation. Do not hydrate each liker with `hf_user_summary(...)` just to recover `isPro` or `type`.
310
+ - `hf_repo_likers(...)` does not use the generic exhaustive hard cap for explicit larger `return_limit` values because the Hub already returns the full liker rows in one response. The default output is still compact unless you ask for more.
311
  - `hf_user_graph(...)` also accepts organization names for `relation="followers"`. For organizations, follower rows use the same canonical user fields (`username`, `fullname`, `isPro`). Organization `following` is not supported by the Hub API, so do not ask `hf_user_graph(..., relation="following")` for an organization.
312
  - Use `hf_recent_activity(...)` for activity-feed prompts. Prefer `feed_type` + `entity` rather than raw `call_api("/api/recent-activity", ...)`.
313
  - `hf_recent_activity(...)` rows can be projected with `event_type`, `repo_id`, `repo_type`, and `timestamp` aliases when you want snake_case output.
 
368
  "latest_likes": item["likes"]["sample"],
369
  }
370
 
371
+ # Followers' GitHub links: fetch usernames first, then read overview socials
372
+ followers = await hf_user_graph(
373
+ relation="followers",
374
+ return_limit=20,
375
+ fields=["username"],
376
+ )
377
+ result = []
378
+ for row in followers["items"]:
379
+ uname = row.get("username")
380
+ if not uname:
381
+ continue
382
+ summary = await hf_user_summary(username=uname)
383
+ item = summary["item"] or (summary["items"][0] if summary["items"] else None)
384
+ if item is None:
385
+ continue
386
+ overview = item.get("overview", {})
387
+ github = overview.get("github")
388
+ if github is not None:
389
+ result.append({"username": uname, "github": github})
390
+ return result
391
+
392
  # Popularity-ranked likes: helper-side shortlist enrichment + ranking
393
  likes = await hf_user_likes(
394
  username="julien-c",
 
413
  },
414
  }
415
 
416
+ # Repo likers: user-shaped liker rows for a specific repo
417
+ likers = await hf_repo_likers(
418
+ repo_id="mteb/leaderboard",
419
+ repo_type="space",
420
+ where={"type": "organization"},
421
+ fields=["username", "type", "isPro"],
422
+ )
423
+ return {
424
+ "repo_id": "mteb/leaderboard",
425
+ "repo_type": "space",
426
+ "organization_likers": likers["items"],
427
+ "meta": likers["meta"],
428
+ }
429
+
430
+ # Exact liker breakdown using helper-side counting
431
+ pro = await hf_repo_likers(
432
+ repo_id="openai/gpt-oss-120b",
433
+ repo_type="model",
434
+ where={"isPro": True},
435
+ count_only=True,
436
+ )
437
+ normal = await hf_repo_likers(
438
+ repo_id="openai/gpt-oss-120b",
439
+ repo_type="model",
440
+ where={"isPro": False},
441
+ count_only=True,
442
+ )
443
+ return {
444
+ "repo_id": "openai/gpt-oss-120b",
445
+ "repo_type": "model",
446
+ "pro_likers": pro["meta"]["total"],
447
+ "normal_likers": normal["meta"]["total"],
448
+ "exact_count": bool(pro["meta"].get("exact_count") and normal["meta"].get("exact_count")),
449
+ }
450
+
451
  # Recent activity with snake_case aliases
452
  activity = await hf_recent_activity(
453
  feed_type="user",
hf-hub-query.md CHANGED
@@ -4,7 +4,7 @@ name: hf_hub_query
4
  model: hf.openai/gpt-oss-120b:cerebras
5
  use_history: false
6
  default: true
7
- description: "Raw-mode version of hf_hub_query. Use natural-language queries to explore the Hugging Face Hub: find and relate users, organizations, repositories, activity, followers, likes, discussions, and collections. Best for read-only Hub discovery, lookup, ranking, joins, overlap, and other relationship questions. Returns a structured result payload with runtime meta information."
8
  shell: false
9
  skills: []
10
  function_tools:
@@ -15,7 +15,7 @@ request_params:
15
 
16
  reasoning: high
17
 
18
- You are a **tool-using, read-only** Hugging Face Hub search/navigation agent in **raw passthrough mode**.
19
  The user must never see your generated Python unless they explicitly ask for debugging.
20
 
21
  ## Mandatory first action
@@ -25,7 +25,7 @@ The user must never see your generated Python unless they explicitly ask for deb
25
  - Never paste `async def solve(...)` into normal assistant text.
26
  - Only skip the tool call if a brief clarification question is strictly required.
27
 
28
- ## Raw passthrough contract
29
  1. Read the user request.
30
  2. Build an inner program in exactly this shape:
31
  ```py
 
4
  model: hf.openai/gpt-oss-120b:cerebras
5
  use_history: false
6
  default: true
7
+ description: "Active natural-language Hugging Face Hub navigator, raw structured-output variant. Read-only, multi-step agent that can chain lookups across users, organizations, and repositories (models, datasets, spaces), plus followers/following, likes/likers, recent activity, discussions, and collections. Good for search, filtering, counts, ranking, overlap/intersection, joins, and relationship questions. Returns structured result data with runtime metadata instead of a rewritten prose answer."
8
  shell: false
9
  skills: []
10
  function_tools:
 
15
 
16
  reasoning: high
17
 
18
+ You are a **tool-using, read-only** Hugging Face Hub search/navigation agent.
19
  The user must never see your generated Python unless they explicitly ask for debugging.
20
 
21
  ## Mandatory first action
 
25
  - Never paste `async def solve(...)` into normal assistant text.
26
  - Only skip the tool call if a brief clarification question is strictly required.
27
 
28
+ ## Tool-call protocol
29
  1. Read the user request.
30
  2. Build an inner program in exactly this shape:
31
  ```py
monty_api_tool_v2.py CHANGED
@@ -93,6 +93,25 @@ _SORT_KEY_ALIASES: dict[str, str] = {
93
  "trending": "trending_score",
94
  }
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  # Extra hf_repo_search kwargs intentionally supported as pass-through to
97
  # huggingface_hub.HfApi.list_models/list_datasets/list_spaces.
98
  # (Generic args like `query/search/sort/author/limit` are handled directly in
@@ -191,6 +210,7 @@ PAGINATION_POLICY: dict[str, dict[str, Any]] = {
191
  "hf_user_followers": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
192
  "hf_user_following": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
193
  "hf_org_members": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
 
194
  "hf_user_likes": {
195
  "mode": "exhaustive",
196
  "scan_max": LIKES_SCAN_MAX,
@@ -217,6 +237,7 @@ HELPER_EXTERNALS = [
217
  "hf_repo_search",
218
  "hf_user_summary",
219
  "hf_user_graph",
 
220
  "hf_user_likes",
221
  "hf_recent_activity",
222
  "hf_repo_discussions",
@@ -243,6 +264,7 @@ ALLOWLIST_PATTERNS = [
243
  r"^/api/users/[^/]+/followers$",
244
  r"^/api/users/[^/]+/following$",
245
  r"^/api/users/[^/]+/likes$",
 
246
  r"^/api/organizations/[^/]+/overview$",
247
  r"^/api/organizations/[^/]+/members$",
248
  r"^/api/organizations/[^/]+/followers$",
@@ -260,6 +282,7 @@ STRICT_ALLOWLIST_PATTERNS = [
260
  r"^/api/whoami-v2$",
261
  r"^/api/trending$",
262
  r"^/api/daily_papers$",
 
263
  r"^/api/collections$",
264
  r"^/api/collections/[^/]+$",
265
  r"^/api/collections/[^/]+/[^/]+$",
@@ -956,40 +979,107 @@ async def _run_with_monty(
956
  return "More results may exist; narrow filters or raise scan/page bounds for better coverage"
957
  return "Ask for a larger limit to see more rows"
958
 
959
- def _helper_success(
960
  *,
961
- start_calls: int,
962
- source: str,
963
- items: list[dict[str, Any]],
964
- cursor: str | None = None,
965
- **meta: Any,
 
 
966
  ) -> dict[str, Any]:
967
- if cursor is not None:
968
- meta["cursor"] = cursor
969
-
970
- return {
971
- "ok": True,
972
- "item": items[0] if len(items) == 1 else None,
973
- "items": items,
974
- "meta": _helper_meta(start_calls, source=source, **meta),
975
- "error": None,
 
 
976
  }
 
 
 
 
 
 
 
 
 
 
 
 
977
 
978
- def _helper_success_meta(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
979
  *,
980
  start_calls: int,
981
  source: str,
982
  items: list[dict[str, Any]],
983
- meta: dict[str, Any],
984
  cursor: str | None = None,
 
 
985
  ) -> dict[str, Any]:
 
 
986
  if cursor is not None:
987
- meta["cursor"] = cursor
 
988
  return {
989
  "ok": True,
990
  "item": items[0] if len(items) == 1 else None,
991
  "items": items,
992
- "meta": _helper_meta(start_calls, source=source, **meta),
993
  "error": None,
994
  }
995
 
@@ -1341,19 +1431,18 @@ async def _run_with_monty(
1341
 
1342
  default_return = _policy_int("hf_org_members", "default_return", 100)
1343
  scan_cap = _policy_int("hf_org_members", "scan_max", FOLLOWERS_SCAN_MAX)
1344
- requested_return_limit = _resolve_requested_limit(return_limit, limit)
1345
- requested_scan_limit = scan_limit
1346
- effective_requested_return_limit = 0 if count_only else requested_return_limit
1347
-
1348
- ret_lim = _clamp_int(
1349
- effective_requested_return_limit,
1350
- default=default_return,
1351
- minimum=0,
1352
- maximum=MAX_EXHAUSTIVE_RETURN_ITEMS,
1353
  )
1354
- scan_lim = _clamp_int(requested_scan_limit, default=scan_cap, minimum=1, maximum=scan_cap)
1355
- default_limit_used = requested_return_limit is None and not count_only
1356
- hard_cap_applied = requested_return_limit is not None and ret_lim < requested_return_limit
1357
  has_where = isinstance(where, dict) and bool(where)
1358
 
1359
  overview_total: int | None = None
@@ -1369,37 +1458,25 @@ async def _run_with_monty(
1369
  sample_complete = overview_total == 0
1370
  more_available = False if sample_complete else True
1371
  truncated_by = _derive_truncated_by(return_limit_hit=overview_total > 0)
1372
- meta = {
1373
- "scanned": 1,
1374
- "matched": overview_total,
1375
- "returned": 0,
1376
- "total": overview_total,
1377
- "total_available": overview_total,
1378
- "total_matched": overview_total,
1379
- "truncated": not sample_complete,
1380
- "complete": sample_complete,
1381
- "exact_count": True,
1382
- "count_source": "overview",
1383
- "sample_complete": sample_complete,
1384
- "more_available": more_available,
1385
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
1386
- "truncated_by": truncated_by,
1387
- "next_request_hint": _derive_next_request_hint(
1388
- truncated_by=truncated_by,
1389
- more_available=more_available,
1390
- applied_return_limit=ret_lim,
1391
- applied_scan_limit=scan_lim,
1392
- ),
1393
- "organization": org,
1394
- }
1395
- meta.update(_derive_limit_metadata(
1396
- requested_return_limit=requested_return_limit,
1397
- applied_return_limit=ret_lim,
1398
- default_limit_used=default_limit_used,
1399
- requested_scan_limit=requested_scan_limit,
1400
- applied_scan_limit=scan_lim,
1401
- ))
1402
- return _helper_success_meta(start_calls=start_calls, source=overview_source, items=[], meta=meta)
1403
 
1404
  endpoint = f"/api/organizations/{org}/members"
1405
  try:
@@ -1457,50 +1534,28 @@ async def _run_with_monty(
1457
  items = _project_items(
1458
  items,
1459
  fields,
1460
- aliases={
1461
- "login": "username",
1462
- "user": "username",
1463
- "handle": "username",
1464
- "name": "fullname",
1465
- "full_name": "fullname",
1466
- "full-name": "fullname",
1467
- "is_pro": "isPro",
1468
- "ispro": "isPro",
1469
- "pro": "isPro",
 
 
 
 
1470
  },
 
 
 
 
 
1471
  )
1472
- meta = {
1473
- "scanned": observed_total,
1474
- "matched": len(normalized),
1475
- "returned": len(items),
1476
- "total": total,
1477
- "total_available": total_available,
1478
- "total_matched": total_matched,
1479
- "truncated": truncated,
1480
- "complete": sample_complete,
1481
- "exact_count": exact_count,
1482
- "count_source": count_source,
1483
- "sample_complete": sample_complete,
1484
- "lower_bound": bool(has_where and not exact_count),
1485
- "more_available": more_available,
1486
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
1487
- "truncated_by": truncated_by,
1488
- "next_request_hint": _derive_next_request_hint(
1489
- truncated_by=truncated_by,
1490
- more_available=more_available,
1491
- applied_return_limit=ret_lim,
1492
- applied_scan_limit=scan_lim,
1493
- ),
1494
- "organization": org,
1495
- }
1496
- meta.update(_derive_limit_metadata(
1497
- requested_return_limit=requested_return_limit,
1498
- applied_return_limit=ret_lim,
1499
- default_limit_used=default_limit_used,
1500
- requested_scan_limit=requested_scan_limit,
1501
- applied_scan_limit=scan_lim,
1502
- ))
1503
- return _helper_success_meta(start_calls=start_calls, source=endpoint, items=items, meta=meta)
1504
 
1505
  async def hf_repo_search(
1506
  query: str | None = None,
@@ -1692,19 +1747,18 @@ async def _run_with_monty(
1692
  if not u:
1693
  return _helper_error(start_calls=start_calls, source=f"/api/users/<u>/{kind}", error="username is required")
1694
 
1695
- requested_return_limit = _resolve_requested_limit(return_limit, limit)
1696
- requested_scan_limit = scan_limit
1697
- effective_requested_return_limit = 0 if count_only else requested_return_limit
1698
-
1699
- ret_lim = _clamp_int(
1700
- effective_requested_return_limit,
1701
- default=default_return,
1702
- minimum=0,
1703
- maximum=MAX_EXHAUSTIVE_RETURN_ITEMS,
1704
  )
1705
- scan_lim = _clamp_int(requested_scan_limit, default=scan_cap, minimum=1, maximum=scan_cap)
1706
- default_limit_used = requested_return_limit is None and not count_only
1707
- hard_cap_applied = requested_return_limit is not None and ret_lim < requested_return_limit
1708
  has_where = isinstance(where, dict) and bool(where)
1709
  filtered = (pro_only is not None) or has_where
1710
 
@@ -1740,44 +1794,32 @@ async def _run_with_monty(
1740
  sample_complete = overview_total == 0
1741
  more_available = False if sample_complete else True
1742
  truncated_by = _derive_truncated_by(return_limit_hit=overview_total > 0)
1743
- meta = {
1744
- "scanned": 1,
1745
- "matched": overview_total,
1746
- "returned": 0,
1747
- "total": overview_total,
1748
- "total_available": overview_total,
1749
- "total_matched": overview_total,
1750
- "truncated": not sample_complete,
1751
- "complete": sample_complete,
1752
- "exact_count": True,
1753
- "count_source": "overview",
1754
- "sample_complete": sample_complete,
1755
- "more_available": more_available,
1756
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
1757
- "truncated_by": truncated_by,
1758
- "next_request_hint": _derive_next_request_hint(
1759
- truncated_by=truncated_by,
1760
- more_available=more_available,
1761
- applied_return_limit=ret_lim,
1762
- applied_scan_limit=scan_lim,
1763
- ),
1764
- "relation": kind,
1765
- "pro_only": pro_only,
1766
- "where_applied": has_where,
1767
- "entity": u,
1768
- "entity_type": entity_type,
1769
- "username": u,
1770
- }
1771
  if entity_type == "organization":
1772
  meta["organization"] = u
1773
- meta.update(_derive_limit_metadata(
1774
- requested_return_limit=requested_return_limit,
1775
- applied_return_limit=ret_lim,
1776
- default_limit_used=default_limit_used,
1777
- requested_scan_limit=requested_scan_limit,
1778
- applied_scan_limit=scan_lim,
1779
- ))
1780
- return _helper_success_meta(
1781
  start_calls=start_calls,
1782
  source=overview_source,
1783
  items=[],
@@ -1858,57 +1900,35 @@ async def _run_with_monty(
1858
  items = _project_items(
1859
  items,
1860
  fields,
1861
- aliases={
1862
- "login": "username",
1863
- "user": "username",
1864
- "handle": "username",
1865
- "name": "fullname",
1866
- "full_name": "fullname",
1867
- "full-name": "fullname",
1868
- "is_pro": "isPro",
1869
- "ispro": "isPro",
1870
- "pro": "isPro",
 
 
 
 
 
 
 
 
 
1871
  },
 
 
 
 
 
1872
  )
1873
- meta = {
1874
- "scanned": observed_total,
1875
- "matched": len(normalized),
1876
- "returned": len(items),
1877
- "total": total,
1878
- "total_available": total_available,
1879
- "total_matched": total_matched,
1880
- "truncated": truncated,
1881
- "complete": sample_complete,
1882
- "exact_count": exact_count,
1883
- "count_source": count_source,
1884
- "sample_complete": sample_complete,
1885
- "lower_bound": bool(filtered and not exact_count),
1886
- "more_available": more_available,
1887
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
1888
- "truncated_by": truncated_by,
1889
- "next_request_hint": _derive_next_request_hint(
1890
- truncated_by=truncated_by,
1891
- more_available=more_available,
1892
- applied_return_limit=ret_lim,
1893
- applied_scan_limit=scan_lim,
1894
- ),
1895
- "relation": kind,
1896
- "pro_only": pro_only,
1897
- "where_applied": has_where,
1898
- "entity": u,
1899
- "entity_type": entity_type,
1900
- "username": u,
1901
- }
1902
  if entity_type == "organization":
1903
  meta["organization"] = u
1904
- meta.update(_derive_limit_metadata(
1905
- requested_return_limit=requested_return_limit,
1906
- applied_return_limit=ret_lim,
1907
- default_limit_used=default_limit_used,
1908
- requested_scan_limit=requested_scan_limit,
1909
- applied_scan_limit=scan_lim,
1910
- ))
1911
- return _helper_success_meta(
1912
  start_calls=start_calls,
1913
  source=endpoint,
1914
  items=items,
@@ -2164,19 +2184,18 @@ async def _run_with_monty(
2164
  error="sort must be one of likedAt, repoLikes, repoDownloads",
2165
  )
2166
 
2167
- requested_return_limit = _resolve_requested_limit(return_limit, limit)
2168
- requested_scan_limit = scan_limit
2169
- effective_requested_return_limit = 0 if count_only else requested_return_limit
2170
-
2171
- ret_lim = _clamp_int(
2172
- effective_requested_return_limit,
2173
- default=default_return,
2174
- minimum=0,
2175
- maximum=MAX_EXHAUSTIVE_RETURN_ITEMS,
2176
  )
2177
- scan_lim = _clamp_int(requested_scan_limit, default=scan_cap, minimum=1, maximum=scan_cap)
2178
- default_limit_used = requested_return_limit is None and not count_only
2179
- hard_cap_applied = requested_return_limit is not None and ret_lim < requested_return_limit
2180
 
2181
  allowed_repo_types: set[str] | None = None
2182
  try:
@@ -2344,43 +2363,166 @@ async def _run_with_monty(
2344
  if scan_limit_hit:
2345
  more_available = "unknown" if (allowed_repo_types is not None or where) else True
2346
 
2347
- meta = {
2348
- "scanned": len(scanned_rows),
2349
- "matched": matched,
2350
- "returned": len(items),
2351
- "total": total,
2352
- "total_available": len(payload),
2353
- "total_matched": total_matched,
2354
- "truncated": truncated,
2355
- "complete": sample_complete,
2356
- "exact_count": exact_count,
2357
- "count_source": "scan",
2358
- "sample_complete": sample_complete,
2359
- "lower_bound": not exact_count,
2360
- "more_available": more_available,
2361
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
2362
- "truncated_by": truncated_by,
2363
- "next_request_hint": _derive_next_request_hint(
2364
- truncated_by=truncated_by,
2365
- more_available=more_available,
2366
- applied_return_limit=ret_lim,
2367
- applied_scan_limit=scan_lim,
2368
- ),
2369
- "enriched": enriched,
2370
- "popularity_present": popularity_present,
2371
- "sort_applied": sort_key,
2372
- "ranking_window": effective_ranking_window,
2373
- "ranking_complete": ranking_complete,
2374
- "username": resolved_username,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2375
  }
2376
- meta.update(_derive_limit_metadata(
2377
- requested_return_limit=requested_return_limit,
2378
- applied_return_limit=ret_lim,
2379
- default_limit_used=default_limit_used,
2380
- requested_scan_limit=requested_scan_limit,
2381
- applied_scan_limit=scan_lim,
2382
- ))
2383
- return _helper_success_meta(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2384
  start_calls=start_calls,
2385
  source=endpoint,
2386
  items=items,
@@ -2421,9 +2563,7 @@ async def _run_with_monty(
2421
  if start_cursor is None:
2422
  start_cursor = startCursor or cursor
2423
 
2424
- requested_return_limit = _resolve_requested_limit(return_limit, limit)
2425
  requested_max_pages = max_pages
2426
- effective_requested_return_limit = 0 if count_only else requested_return_limit
2427
 
2428
  if isinstance(username, str) and username.strip():
2429
  entity = username.strip()
@@ -2448,16 +2588,17 @@ async def _run_with_monty(
2448
  if not ent:
2449
  return _helper_error(start_calls=start_calls, source="/api/recent-activity", error="entity is required")
2450
 
2451
- ret_lim = _clamp_int(
2452
- effective_requested_return_limit,
2453
- default=default_return,
2454
- minimum=0,
2455
- maximum=MAX_EXHAUSTIVE_RETURN_ITEMS,
 
2456
  )
 
2457
  page_lim = page_cap
2458
  pages_lim = _clamp_int(requested_max_pages, default=pages_cap, minimum=1, maximum=pages_cap)
2459
- default_limit_used = requested_return_limit is None and not count_only
2460
- hard_cap_applied = requested_return_limit is not None and ret_lim < requested_return_limit
2461
 
2462
  type_filter = {str(t).strip().lower() for t in (activity_types or []) if str(t).strip()}
2463
  repo_filter = {_canonical_repo_type(t, default="") for t in (repo_types or []) if str(t).strip()}
@@ -2562,41 +2703,31 @@ async def _run_with_monty(
2562
  elif stopped_for_budget and not exact_count:
2563
  more_available = "unknown"
2564
 
2565
- meta = {
2566
- "scanned": scanned,
2567
- "matched": matched,
2568
- "returned": len(items),
2569
- "total": matched,
2570
- "total_matched": matched,
2571
- "pages": pages,
2572
- "truncated": truncated,
2573
- "complete": sample_complete,
2574
- "exact_count": exact_count,
2575
- "count_source": "scan" if exact_count else "none",
2576
- "sample_complete": sample_complete,
2577
- "lower_bound": not exact_count,
2578
- "more_available": more_available,
2579
- "can_request_more": _derive_can_request_more(sample_complete=sample_complete, truncated_by=truncated_by),
2580
- "truncated_by": truncated_by,
2581
- "next_request_hint": _derive_next_request_hint(
2582
- truncated_by=truncated_by,
2583
- more_available=more_available,
2584
- applied_return_limit=ret_lim,
2585
- applied_max_pages=pages_lim,
2586
- ),
2587
- "page_limit": page_lim,
2588
- "stopped_for_budget": stopped_for_budget,
2589
- "feed_type": ft,
2590
- "entity": ent,
2591
- }
2592
- meta.update(_derive_limit_metadata(
2593
- requested_return_limit=requested_return_limit,
2594
- applied_return_limit=ret_lim,
2595
- default_limit_used=default_limit_used,
2596
  requested_max_pages=requested_max_pages,
2597
  applied_max_pages=pages_lim,
2598
- ))
2599
- return _helper_success_meta(
2600
  start_calls=start_calls,
2601
  source="/api/recent-activity",
2602
  items=items,
@@ -3009,6 +3140,7 @@ async def _run_with_monty(
3009
  "hf_repo_search": _collecting_wrapper("hf_repo_search", hf_repo_search),
3010
  "hf_user_summary": _collecting_wrapper("hf_user_summary", hf_user_summary),
3011
  "hf_user_graph": _collecting_wrapper("hf_user_graph", hf_user_graph),
 
3012
  "hf_user_likes": _collecting_wrapper("hf_user_likes", hf_user_likes),
3013
  "hf_recent_activity": _collecting_wrapper("hf_recent_activity", hf_recent_activity),
3014
  "hf_repo_discussions": _collecting_wrapper("hf_repo_discussions", hf_repo_discussions),
 
93
  "trending": "trending_score",
94
  }
95
 
96
+ _USER_FIELD_ALIASES: dict[str, str] = {
97
+ "login": "username",
98
+ "user": "username",
99
+ "handle": "username",
100
+ "name": "fullname",
101
+ "full_name": "fullname",
102
+ "full-name": "fullname",
103
+ "is_pro": "isPro",
104
+ "ispro": "isPro",
105
+ "pro": "isPro",
106
+ }
107
+
108
+ _ACTOR_FIELD_ALIASES: dict[str, str] = {
109
+ **_USER_FIELD_ALIASES,
110
+ "entity_type": "type",
111
+ "user_type": "type",
112
+ "actor_type": "type",
113
+ }
114
+
115
  # Extra hf_repo_search kwargs intentionally supported as pass-through to
116
  # huggingface_hub.HfApi.list_models/list_datasets/list_spaces.
117
  # (Generic args like `query/search/sort/author/limit` are handled directly in
 
210
  "hf_user_followers": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
211
  "hf_user_following": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
212
  "hf_org_members": {"mode": "exhaustive", "scan_max": FOLLOWERS_SCAN_MAX, "default_return": 1_000},
213
+ "hf_repo_likers": {"mode": "exhaustive", "default_return": 1_000},
214
  "hf_user_likes": {
215
  "mode": "exhaustive",
216
  "scan_max": LIKES_SCAN_MAX,
 
237
  "hf_repo_search",
238
  "hf_user_summary",
239
  "hf_user_graph",
240
+ "hf_repo_likers",
241
  "hf_user_likes",
242
  "hf_recent_activity",
243
  "hf_repo_discussions",
 
264
  r"^/api/users/[^/]+/followers$",
265
  r"^/api/users/[^/]+/following$",
266
  r"^/api/users/[^/]+/likes$",
267
+ r"^/api/(models|datasets|spaces)/(?:[^/]+|[^/]+/[^/]+)/likers$",
268
  r"^/api/organizations/[^/]+/overview$",
269
  r"^/api/organizations/[^/]+/members$",
270
  r"^/api/organizations/[^/]+/followers$",
 
282
  r"^/api/whoami-v2$",
283
  r"^/api/trending$",
284
  r"^/api/daily_papers$",
285
+ r"^/api/(models|datasets|spaces)/(?:[^/]+|[^/]+/[^/]+)/likers$",
286
  r"^/api/collections$",
287
  r"^/api/collections/[^/]+$",
288
  r"^/api/collections/[^/]+/[^/]+$",
 
979
  return "More results may exist; narrow filters or raise scan/page bounds for better coverage"
980
  return "Ask for a larger limit to see more rows"
981
 
982
+ def _resolve_exhaustive_limits(
983
  *,
984
+ return_limit: int | None,
985
+ limit: int | None,
986
+ count_only: bool,
987
+ default_return: int,
988
+ max_return: int,
989
+ scan_limit: int | None = None,
990
+ scan_cap: int | None = None,
991
  ) -> dict[str, Any]:
992
+ requested_return_limit = _resolve_requested_limit(return_limit, limit)
993
+ effective_requested_return_limit = 0 if count_only else requested_return_limit
994
+ out: dict[str, Any] = {
995
+ "requested_return_limit": requested_return_limit,
996
+ "applied_return_limit": _clamp_int(
997
+ effective_requested_return_limit,
998
+ default=default_return,
999
+ minimum=0,
1000
+ maximum=max_return,
1001
+ ),
1002
+ "default_limit_used": requested_return_limit is None and not count_only,
1003
  }
1004
+ out["hard_cap_applied"] = (
1005
+ requested_return_limit is not None and out["applied_return_limit"] < requested_return_limit
1006
+ )
1007
+ if scan_cap is not None:
1008
+ out["requested_scan_limit"] = scan_limit
1009
+ out["applied_scan_limit"] = _clamp_int(
1010
+ scan_limit,
1011
+ default=scan_cap,
1012
+ minimum=1,
1013
+ maximum=scan_cap,
1014
+ )
1015
+ return out
1016
 
1017
+ def _build_exhaustive_meta(
1018
+ *,
1019
+ base_meta: dict[str, Any],
1020
+ limit_plan: dict[str, Any],
1021
+ sample_complete: bool,
1022
+ exact_count: bool,
1023
+ truncated_by: str,
1024
+ more_available: bool | str,
1025
+ requested_max_pages: int | None = None,
1026
+ applied_max_pages: int | None = None,
1027
+ ) -> dict[str, Any]:
1028
+ meta = dict(base_meta)
1029
+ applied_return_limit = int(limit_plan["applied_return_limit"])
1030
+ applied_scan_limit = limit_plan.get("applied_scan_limit")
1031
+ meta.update(
1032
+ {
1033
+ "complete": sample_complete,
1034
+ "exact_count": exact_count,
1035
+ "sample_complete": sample_complete,
1036
+ "more_available": more_available,
1037
+ "can_request_more": _derive_can_request_more(
1038
+ sample_complete=sample_complete,
1039
+ truncated_by=truncated_by,
1040
+ ),
1041
+ "truncated_by": truncated_by,
1042
+ "next_request_hint": _derive_next_request_hint(
1043
+ truncated_by=truncated_by,
1044
+ more_available=more_available,
1045
+ applied_return_limit=applied_return_limit,
1046
+ applied_scan_limit=applied_scan_limit if isinstance(applied_scan_limit, int) else None,
1047
+ applied_max_pages=applied_max_pages,
1048
+ ),
1049
+ }
1050
+ )
1051
+ meta.update(
1052
+ _derive_limit_metadata(
1053
+ requested_return_limit=limit_plan["requested_return_limit"],
1054
+ applied_return_limit=applied_return_limit,
1055
+ default_limit_used=bool(limit_plan["default_limit_used"]),
1056
+ requested_scan_limit=limit_plan.get("requested_scan_limit"),
1057
+ applied_scan_limit=applied_scan_limit if isinstance(applied_scan_limit, int) else None,
1058
+ requested_max_pages=requested_max_pages,
1059
+ applied_max_pages=applied_max_pages,
1060
+ )
1061
+ )
1062
+ return meta
1063
+
1064
+ def _helper_success(
1065
  *,
1066
  start_calls: int,
1067
  source: str,
1068
  items: list[dict[str, Any]],
 
1069
  cursor: str | None = None,
1070
+ meta: dict[str, Any] | None = None,
1071
+ **extra_meta: Any,
1072
  ) -> dict[str, Any]:
1073
+ merged_meta = dict(meta or {})
1074
+ merged_meta.update(extra_meta)
1075
  if cursor is not None:
1076
+ merged_meta["cursor"] = cursor
1077
+
1078
  return {
1079
  "ok": True,
1080
  "item": items[0] if len(items) == 1 else None,
1081
  "items": items,
1082
+ "meta": _helper_meta(start_calls, source=source, **merged_meta),
1083
  "error": None,
1084
  }
1085
 
 
1431
 
1432
  default_return = _policy_int("hf_org_members", "default_return", 100)
1433
  scan_cap = _policy_int("hf_org_members", "scan_max", FOLLOWERS_SCAN_MAX)
1434
+ limit_plan = _resolve_exhaustive_limits(
1435
+ return_limit=return_limit,
1436
+ limit=limit,
1437
+ count_only=count_only,
1438
+ default_return=default_return,
1439
+ max_return=MAX_EXHAUSTIVE_RETURN_ITEMS,
1440
+ scan_limit=scan_limit,
1441
+ scan_cap=scan_cap,
 
1442
  )
1443
+ ret_lim = int(limit_plan["applied_return_limit"])
1444
+ scan_lim = int(limit_plan["applied_scan_limit"])
1445
+ hard_cap_applied = bool(limit_plan["hard_cap_applied"])
1446
  has_where = isinstance(where, dict) and bool(where)
1447
 
1448
  overview_total: int | None = None
 
1458
  sample_complete = overview_total == 0
1459
  more_available = False if sample_complete else True
1460
  truncated_by = _derive_truncated_by(return_limit_hit=overview_total > 0)
1461
+ meta = _build_exhaustive_meta(
1462
+ base_meta={
1463
+ "scanned": 1,
1464
+ "matched": overview_total,
1465
+ "returned": 0,
1466
+ "total": overview_total,
1467
+ "total_available": overview_total,
1468
+ "total_matched": overview_total,
1469
+ "truncated": not sample_complete,
1470
+ "count_source": "overview",
1471
+ "organization": org,
1472
+ },
1473
+ limit_plan=limit_plan,
1474
+ sample_complete=sample_complete,
1475
+ exact_count=True,
1476
+ truncated_by=truncated_by,
1477
+ more_available=more_available,
1478
+ )
1479
+ return _helper_success(start_calls=start_calls, source=overview_source, items=[], meta=meta)
 
 
 
 
 
 
 
 
 
 
 
 
1480
 
1481
  endpoint = f"/api/organizations/{org}/members"
1482
  try:
 
1534
  items = _project_items(
1535
  items,
1536
  fields,
1537
+ aliases=_USER_FIELD_ALIASES,
1538
+ )
1539
+ meta = _build_exhaustive_meta(
1540
+ base_meta={
1541
+ "scanned": observed_total,
1542
+ "matched": len(normalized),
1543
+ "returned": len(items),
1544
+ "total": total,
1545
+ "total_available": total_available,
1546
+ "total_matched": total_matched,
1547
+ "truncated": truncated,
1548
+ "count_source": count_source,
1549
+ "lower_bound": bool(has_where and not exact_count),
1550
+ "organization": org,
1551
  },
1552
+ limit_plan=limit_plan,
1553
+ sample_complete=sample_complete,
1554
+ exact_count=exact_count,
1555
+ truncated_by=truncated_by,
1556
+ more_available=more_available,
1557
  )
1558
+ return _helper_success(start_calls=start_calls, source=endpoint, items=items, meta=meta)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1559
 
1560
  async def hf_repo_search(
1561
  query: str | None = None,
 
1747
  if not u:
1748
  return _helper_error(start_calls=start_calls, source=f"/api/users/<u>/{kind}", error="username is required")
1749
 
1750
+ limit_plan = _resolve_exhaustive_limits(
1751
+ return_limit=return_limit,
1752
+ limit=limit,
1753
+ count_only=count_only,
1754
+ default_return=default_return,
1755
+ max_return=MAX_EXHAUSTIVE_RETURN_ITEMS,
1756
+ scan_limit=scan_limit,
1757
+ scan_cap=scan_cap,
 
1758
  )
1759
+ ret_lim = int(limit_plan["applied_return_limit"])
1760
+ scan_lim = int(limit_plan["applied_scan_limit"])
1761
+ hard_cap_applied = bool(limit_plan["hard_cap_applied"])
1762
  has_where = isinstance(where, dict) and bool(where)
1763
  filtered = (pro_only is not None) or has_where
1764
 
 
1794
  sample_complete = overview_total == 0
1795
  more_available = False if sample_complete else True
1796
  truncated_by = _derive_truncated_by(return_limit_hit=overview_total > 0)
1797
+ meta = _build_exhaustive_meta(
1798
+ base_meta={
1799
+ "scanned": 1,
1800
+ "matched": overview_total,
1801
+ "returned": 0,
1802
+ "total": overview_total,
1803
+ "total_available": overview_total,
1804
+ "total_matched": overview_total,
1805
+ "truncated": not sample_complete,
1806
+ "count_source": "overview",
1807
+ "relation": kind,
1808
+ "pro_only": pro_only,
1809
+ "where_applied": has_where,
1810
+ "entity": u,
1811
+ "entity_type": entity_type,
1812
+ "username": u,
1813
+ },
1814
+ limit_plan=limit_plan,
1815
+ sample_complete=sample_complete,
1816
+ exact_count=True,
1817
+ truncated_by=truncated_by,
1818
+ more_available=more_available,
1819
+ )
 
 
 
 
 
1820
  if entity_type == "organization":
1821
  meta["organization"] = u
1822
+ return _helper_success(
 
 
 
 
 
 
 
1823
  start_calls=start_calls,
1824
  source=overview_source,
1825
  items=[],
 
1900
  items = _project_items(
1901
  items,
1902
  fields,
1903
+ aliases=_USER_FIELD_ALIASES,
1904
+ )
1905
+ meta = _build_exhaustive_meta(
1906
+ base_meta={
1907
+ "scanned": observed_total,
1908
+ "matched": len(normalized),
1909
+ "returned": len(items),
1910
+ "total": total,
1911
+ "total_available": total_available,
1912
+ "total_matched": total_matched,
1913
+ "truncated": truncated,
1914
+ "count_source": count_source,
1915
+ "lower_bound": bool(filtered and not exact_count),
1916
+ "relation": kind,
1917
+ "pro_only": pro_only,
1918
+ "where_applied": has_where,
1919
+ "entity": u,
1920
+ "entity_type": entity_type,
1921
+ "username": u,
1922
  },
1923
+ limit_plan=limit_plan,
1924
+ sample_complete=sample_complete,
1925
+ exact_count=exact_count,
1926
+ truncated_by=truncated_by,
1927
+ more_available=more_available,
1928
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1929
  if entity_type == "organization":
1930
  meta["organization"] = u
1931
+ return _helper_success(
 
 
 
 
 
 
 
1932
  start_calls=start_calls,
1933
  source=endpoint,
1934
  items=items,
 
2184
  error="sort must be one of likedAt, repoLikes, repoDownloads",
2185
  )
2186
 
2187
+ limit_plan = _resolve_exhaustive_limits(
2188
+ return_limit=return_limit,
2189
+ limit=limit,
2190
+ count_only=count_only,
2191
+ default_return=default_return,
2192
+ max_return=MAX_EXHAUSTIVE_RETURN_ITEMS,
2193
+ scan_limit=scan_limit,
2194
+ scan_cap=scan_cap,
 
2195
  )
2196
+ ret_lim = int(limit_plan["applied_return_limit"])
2197
+ scan_lim = int(limit_plan["applied_scan_limit"])
2198
+ hard_cap_applied = bool(limit_plan["hard_cap_applied"])
2199
 
2200
  allowed_repo_types: set[str] | None = None
2201
  try:
 
2363
  if scan_limit_hit:
2364
  more_available = "unknown" if (allowed_repo_types is not None or where) else True
2365
 
2366
+ meta = _build_exhaustive_meta(
2367
+ base_meta={
2368
+ "scanned": len(scanned_rows),
2369
+ "matched": matched,
2370
+ "returned": len(items),
2371
+ "total": total,
2372
+ "total_available": len(payload),
2373
+ "total_matched": total_matched,
2374
+ "truncated": truncated,
2375
+ "count_source": "scan",
2376
+ "lower_bound": not exact_count,
2377
+ "enriched": enriched,
2378
+ "popularity_present": popularity_present,
2379
+ "sort_applied": sort_key,
2380
+ "ranking_window": effective_ranking_window,
2381
+ "ranking_complete": ranking_complete,
2382
+ "username": resolved_username,
2383
+ },
2384
+ limit_plan=limit_plan,
2385
+ sample_complete=sample_complete,
2386
+ exact_count=exact_count,
2387
+ truncated_by=truncated_by,
2388
+ more_available=more_available,
2389
+ )
2390
+ return _helper_success(
2391
+ start_calls=start_calls,
2392
+ source=endpoint,
2393
+ items=items,
2394
+ meta=meta,
2395
+ )
2396
+
2397
+ async def hf_repo_likers(
2398
+ repo_id: str,
2399
+ repo_type: str,
2400
+ return_limit: int | None = None,
2401
+ limit: int | None = None,
2402
+ count_only: bool = False,
2403
+ pro_only: bool | None = None,
2404
+ where: dict[str, Any] | None = None,
2405
+ fields: list[str] | None = None,
2406
+ ) -> dict[str, Any]:
2407
+ start_calls = call_count["n"]
2408
+ rid = str(repo_id or "").strip()
2409
+ if not rid:
2410
+ return _helper_error(start_calls=start_calls, source="/api/repos/<repo>/likers", error="repo_id is required")
2411
+
2412
+ rt = _canonical_repo_type(repo_type, default="")
2413
+ if rt not in {"model", "dataset", "space"}:
2414
+ return _helper_error(
2415
+ start_calls=start_calls,
2416
+ source=f"/api/repos/{rid}/likers",
2417
+ error=f"Unsupported repo_type '{repo_type}'",
2418
+ repo_id=rid,
2419
+ )
2420
+
2421
+ default_return = _policy_int("hf_repo_likers", "default_return", 1_000)
2422
+ requested_return_limit = _resolve_requested_limit(return_limit, limit)
2423
+ default_limit_used = requested_return_limit is None and not count_only
2424
+ has_where = isinstance(where, dict) and bool(where)
2425
+
2426
+ endpoint = f"/api/{rt}s/{rid}/likers"
2427
+ resp = _host_raw_call(endpoint)
2428
+ if not resp.get("ok"):
2429
+ return _helper_error(
2430
+ start_calls=start_calls,
2431
+ source=endpoint,
2432
+ error=resp.get("error") or "repo likers fetch failed",
2433
+ repo_id=rid,
2434
+ repo_type=rt,
2435
+ )
2436
+
2437
+ payload = resp.get("data") if isinstance(resp.get("data"), list) else []
2438
+ normalized: list[dict[str, Any]] = []
2439
+ for row in payload:
2440
+ if not isinstance(row, dict):
2441
+ continue
2442
+ username = row.get("user") or row.get("username")
2443
+ if not isinstance(username, str) or not username:
2444
+ continue
2445
+ item = {
2446
+ "username": username,
2447
+ "fullname": row.get("fullname"),
2448
+ "type": row.get("type") if isinstance(row.get("type"), str) and row.get("type") else "user",
2449
+ "isPro": row.get("isPro"),
2450
+ }
2451
+ if pro_only is True and item.get("isPro") is not True:
2452
+ continue
2453
+ if pro_only is False and item.get("isPro") is True:
2454
+ continue
2455
+ if not _item_matches_where(item, where):
2456
+ continue
2457
+ normalized.append(item)
2458
+
2459
+ # /likers is a one-shot full-list endpoint: the Hub returns the liker rows in a
2460
+ # single response with no cursor/scan continuation. Keep the default output compact,
2461
+ # but do not apply the generic exhaustive hard cap here because it does not improve
2462
+ # upstream coverage or cost; the full liker set has already been fetched.
2463
+ if count_only:
2464
+ ret_lim = 0
2465
+ elif requested_return_limit is None:
2466
+ ret_lim = default_return
2467
+ else:
2468
+ try:
2469
+ ret_lim = max(0, int(requested_return_limit))
2470
+ except Exception:
2471
+ ret_lim = default_return
2472
+ limit_plan = {
2473
+ "requested_return_limit": requested_return_limit,
2474
+ "applied_return_limit": ret_lim,
2475
+ "default_limit_used": default_limit_used,
2476
+ "hard_cap_applied": False,
2477
  }
2478
+
2479
+ matched = len(normalized)
2480
+ items = [] if count_only else normalized[:ret_lim]
2481
+ return_limit_hit = ret_lim > 0 and matched > ret_lim
2482
+ truncated_by = _derive_truncated_by(
2483
+ hard_cap=False,
2484
+ return_limit_hit=return_limit_hit,
2485
+ )
2486
+ sample_complete = matched <= ret_lim and (not count_only or matched == 0)
2487
+ truncated = truncated_by != "none"
2488
+ more_available = _derive_more_available(
2489
+ sample_complete=sample_complete,
2490
+ exact_count=True,
2491
+ returned=len(items),
2492
+ total=matched,
2493
+ )
2494
+
2495
+ items = _project_items(
2496
+ items,
2497
+ fields,
2498
+ aliases=_ACTOR_FIELD_ALIASES,
2499
+ )
2500
+
2501
+ meta = _build_exhaustive_meta(
2502
+ base_meta={
2503
+ "scanned": len(payload),
2504
+ "matched": matched,
2505
+ "returned": len(items),
2506
+ "total": matched,
2507
+ "total_available": len(payload),
2508
+ "total_matched": matched,
2509
+ "truncated": truncated,
2510
+ "count_source": "likers_list",
2511
+ "lower_bound": False,
2512
+ "repo_id": rid,
2513
+ "repo_type": rt,
2514
+ "pro_only": pro_only,
2515
+ "where_applied": has_where,
2516
+ "upstream_pagination": "none",
2517
+ },
2518
+ limit_plan=limit_plan,
2519
+ sample_complete=sample_complete,
2520
+ exact_count=True,
2521
+ truncated_by=truncated_by,
2522
+ more_available=more_available,
2523
+ )
2524
+ meta["hard_cap_applied"] = False
2525
+ return _helper_success(
2526
  start_calls=start_calls,
2527
  source=endpoint,
2528
  items=items,
 
2563
  if start_cursor is None:
2564
  start_cursor = startCursor or cursor
2565
 
 
2566
  requested_max_pages = max_pages
 
2567
 
2568
  if isinstance(username, str) and username.strip():
2569
  entity = username.strip()
 
2588
  if not ent:
2589
  return _helper_error(start_calls=start_calls, source="/api/recent-activity", error="entity is required")
2590
 
2591
+ limit_plan = _resolve_exhaustive_limits(
2592
+ return_limit=return_limit,
2593
+ limit=limit,
2594
+ count_only=count_only,
2595
+ default_return=default_return,
2596
+ max_return=MAX_EXHAUSTIVE_RETURN_ITEMS,
2597
  )
2598
+ ret_lim = int(limit_plan["applied_return_limit"])
2599
  page_lim = page_cap
2600
  pages_lim = _clamp_int(requested_max_pages, default=pages_cap, minimum=1, maximum=pages_cap)
2601
+ hard_cap_applied = bool(limit_plan["hard_cap_applied"])
 
2602
 
2603
  type_filter = {str(t).strip().lower() for t in (activity_types or []) if str(t).strip()}
2604
  repo_filter = {_canonical_repo_type(t, default="") for t in (repo_types or []) if str(t).strip()}
 
2703
  elif stopped_for_budget and not exact_count:
2704
  more_available = "unknown"
2705
 
2706
+ meta = _build_exhaustive_meta(
2707
+ base_meta={
2708
+ "scanned": scanned,
2709
+ "matched": matched,
2710
+ "returned": len(items),
2711
+ "total": matched,
2712
+ "total_matched": matched,
2713
+ "pages": pages,
2714
+ "truncated": truncated,
2715
+ "count_source": "scan" if exact_count else "none",
2716
+ "lower_bound": not exact_count,
2717
+ "page_limit": page_lim,
2718
+ "stopped_for_budget": stopped_for_budget,
2719
+ "feed_type": ft,
2720
+ "entity": ent,
2721
+ },
2722
+ limit_plan=limit_plan,
2723
+ sample_complete=sample_complete,
2724
+ exact_count=exact_count,
2725
+ truncated_by=truncated_by,
2726
+ more_available=more_available,
 
 
 
 
 
 
 
 
 
 
2727
  requested_max_pages=requested_max_pages,
2728
  applied_max_pages=pages_lim,
2729
+ )
2730
+ return _helper_success(
2731
  start_calls=start_calls,
2732
  source="/api/recent-activity",
2733
  items=items,
 
3140
  "hf_repo_search": _collecting_wrapper("hf_repo_search", hf_repo_search),
3141
  "hf_user_summary": _collecting_wrapper("hf_user_summary", hf_user_summary),
3142
  "hf_user_graph": _collecting_wrapper("hf_user_graph", hf_user_graph),
3143
+ "hf_repo_likers": _collecting_wrapper("hf_repo_likers", hf_repo_likers),
3144
  "hf_user_likes": _collecting_wrapper("hf_user_likes", hf_user_likes),
3145
  "hf_recent_activity": _collecting_wrapper("hf_recent_activity", hf_recent_activity),
3146
  "hf_repo_discussions": _collecting_wrapper("hf_repo_discussions", hf_repo_discussions),