Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

evalstate HF Staff commited on Mar 20

Commit

6df88a8

verified ·

1 Parent(s): 6c59653

Publish collection owner lookup fix

Browse files

Files changed (29) hide show

Dockerfile +1 -1
_monty_codegen_shared.md +167 -436
hf-hub-query.md +1 -1
monty_api/__pycache__/aliases.cpython-313.pyc +0 -0
monty_api/__pycache__/constants.cpython-313.pyc +0 -0
monty_api/__pycache__/http_runtime.cpython-313.pyc +0 -0
monty_api/__pycache__/registry.cpython-313.pyc +0 -0
monty_api/__pycache__/runtime_context.cpython-313.pyc +0 -0
monty_api/__pycache__/runtime_envelopes.cpython-313.pyc +0 -0
monty_api/__pycache__/runtime_filtering.cpython-313.pyc +0 -0
monty_api/__pycache__/validation.cpython-313.pyc +0 -0
monty_api/aliases.py +0 -90
monty_api/constants.py +40 -3
monty_api/helpers/__pycache__/activity.cpython-313.pyc +0 -0
monty_api/helpers/__pycache__/collections.cpython-313.pyc +0 -0
monty_api/helpers/__pycache__/introspection.cpython-313.pyc +0 -0
monty_api/helpers/__pycache__/profiles.cpython-313.pyc +0 -0
monty_api/helpers/__pycache__/repos.cpython-313.pyc +0 -0
monty_api/helpers/activity.py +28 -12
monty_api/helpers/collections.py +112 -29
monty_api/helpers/introspection.py +131 -59
monty_api/helpers/profiles.py +92 -51
monty_api/helpers/repos.py +513 -252
monty_api/http_runtime.py +9 -5
monty_api/registry.py +74 -69
monty_api/runtime_context.py +4 -0
monty_api/runtime_envelopes.py +35 -36
monty_api/runtime_filtering.py +65 -32
monty_api/validation.py +2 -2

Dockerfile CHANGED Viewed

@@ -15,7 +15,7 @@ COPY wheels /tmp/wheels
 RUN uv pip install --system --no-cache \
     "fast-agent-mcp>=0.6.1" \
     huggingface_hub \
-    /tmp/wheels/pydantic_monty-0.0.8-cp313-cp313-manylinux_2_35_x86_64.whl
 COPY --link ./ /app
 RUN chown -R 1000:1000 /app

 RUN uv pip install --system --no-cache \
     "fast-agent-mcp>=0.6.1" \
     huggingface_hub \
+    "pydantic-monty==0.0.8"
 COPY --link ./ /app
 RUN chown -R 1000:1000 /app

_monty_codegen_shared.md CHANGED Viewed

@@ -1,49 +1,146 @@
-## Runtime rules for generated code
-- You **MUST NOT** use any imports.
-- All helper functions are already in scope.
-- All helper/API calls are async: always use `await`.
-- `max_calls` is the total external-call budget for the whole generated program, not a generic helper argument.
-- The outer wrapper is an exact contract. You **MUST** use this exact skeleton and only change the body:
 ```py
 async def solve(query, max_calls):
     ...
-    # body goes here
 await solve(query, max_calls)
 ```
-- Use only the documented `hf_*` helpers below.
-- For questions about supported helpers, fields, defaults, limits, or runtime capabilities, use `hf_runtime_capabilities(...)` instead of hand-authoring a static answer from memory.
-- Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
-- Prefer canonical snake_case keys in generated code and in JSON output.
-- For row/field selection prompts, prefer returning a compact list/dict with the requested fields instead of prose formatting inside `solve(...)`.
-- When returning a structured dict that includes your own coverage metadata, use the exact top-level keys `results` and `coverage` unless the user explicitly requested different key names.
-- Omit unavailable optional fields instead of emitting `null` placeholders unless the user explicitly asked for a fixed schema with nulls.
-- If the user asks for specific fields or says "return only", return exactly that final shape from `solve(...)`.
-- For current-user prompts (`my`, `me`), use helpers with `username=None` / `handle=None` first. Only ask for identity if that fails.
-- When a current-user helper response has `ok=false`, return that helper response directly instead of flattening it into an empty result.
-## Common helper signature traps
-These are high-priority rules. Do not guess helper arguments.
-- `hf_repo_search(...)` uses `limit`, **not** `return_limit`, and does **not** accept `count_only`.
-- In `hf_repo_search(...)`, `filters` means upstream Hugging Face repo/tag filters, while `where` means local predicates over returned normalized row fields.
-- `hf_trending(...)` uses `limit`, **not** `return_limit`.
-- `hf_daily_papers(...)` uses `limit`, **not** `return_limit`.
-- `hf_repo_discussions(...)` uses `limit`, **not** `return_limit`, and does **not** accept `fields`.
-- `hf_user_graph(...)`, `hf_user_likes(...)`, `hf_org_members(...)`, `hf_recent_activity(...)`, and `hf_collection_items(...)` use `return_limit`.
 - `hf_profile_summary(include=...)` supports only `"likes"` and `"activity"`.
-- Do **not** guess `hf_profile_summary(include=[...])` values such as `"followers"`, `"following"`, `"models"`, `"datasets"`, or `"spaces"`.
-- `followers_count`, `following_count`, `models_count`, `datasets_count`, `spaces_count`, and similar aggregate counts already come from the base `hf_profile_summary(...)["item"]`.
-- `return_limit=None` does **not** mean exhaustive or "all rows". It means the helper uses its documented default.
-- When `count_only=True`, omit `return_limit`; count-only requests ignore row-return limits and return no items.
-- For "how many models/datasets/spaces does org/user X have?" prefer `hf_profile_summary(...)["item"]` instead of trying to count with `hf_repo_search(...)`.
-- Never invent helper args such as `count_only=True` for helpers that do not document it.
 ## Helper result shape
 All helpers return:
 ```py
 {
   "ok": bool,
@@ -56,427 +153,61 @@ All helpers return:
 Rules:
 - `items` is the canonical list field.
-- `item` is only a singleton convenience.
-- `meta` contains helper-owned execution, coverage, and limit information.
-- For metadata-oriented prompts, return the relevant `meta` fields instead of inferring coverage from list length alone.
-- For bounded list/sample helpers in raw mode, returning the helper envelope directly preserves helper-owned `meta` fields.
-## Helper API
 ```py
-await hf_runtime_capabilities(section: str | None = None)
-await hf_profile_summary(
-  handle: str | None = None,
-  include: list[str] | None = None,
-  likes_limit: int = 10,
-  activity_limit: int = 10,
-)
-# include supports only: ["likes"], ["activity"], or ["likes", "activity"]
-# aggregate counts like followers_count / following_count / models_count are already in item
-await hf_org_members(
-  organization: str,
-  return_limit: int | None = None,
-  scan_limit: int | None = None,
-  count_only: bool = False,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_repo_search(
-  query: str | None = None,
-  repo_type: str | None = None,
-  repo_types: list[str] | None = None,
-  author: str | None = None,
-  filters: list[str] | None = None,
-  sort: str | None = None,
-  limit: int = 20,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-  advanced: dict | None = None,
-)
-# hf_repo_search contract:
-# - filters: upstream HF search/tag filters only (not arbitrary returned row fields)
-# - where: local predicate over returned normalized row fields
-# - fields: select which normalized row fields are returned
-# - for Space runtime status, use where={"runtime_stage": ...}, not filters=["state:..."]
-await hf_repo_details(
-  repo_id: str | None = None,
-  repo_ids: list[str] | None = None,
-  repo_type: str = "auto",
-  fields: list[str] | None = None,
-)
-await hf_trending(
-  repo_type: str = "model",
-  limit: int = 20,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_daily_papers(
-  limit: int = 20,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_user_graph(
-  username: str | None = None,
-  relation: str = "followers",
-  return_limit: int | None = None,
-  scan_limit: int | None = None,
-  count_only: bool = False,
-  pro_only: bool | None = None,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_repo_likers(
-  repo_id: str,
-  repo_type: str,
-  return_limit: int | None = None,
-  count_only: bool = False,
-  pro_only: bool | None = None,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_user_likes(
-  username: str | None = None,
-  repo_types: list[str] | None = None,
-  return_limit: int | None = None,
-  scan_limit: int | None = None,
-  count_only: bool = False,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-  sort: str | None = None,
-  ranking_window: int | None = None,
-)
-await hf_recent_activity(
-  feed_type: str | None = None,
-  entity: str | None = None,
-  activity_types: list[str] | None = None,
-  repo_types: list[str] | None = None,
-  return_limit: int | None = None,
-  max_pages: int | None = None,
-  start_cursor: str | None = None,
-  count_only: bool = False,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_repo_discussions(repo_type: str, repo_id: str, limit: int = 20)
-await hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int)
-await hf_collections_search(
-  query: str | None = None,
-  owner: str | None = None,
-  return_limit: int = 20,
-  count_only: bool = False,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_collection_items(
-  collection_id: str,
-  repo_types: list[str] | None = None,
-  return_limit: int = 100,
-  count_only: bool = False,
-  where: dict | None = None,
-  fields: list[str] | None = None,
-)
-await hf_whoami()
-```
-## Canonical sort / expand contract
-- Use canonical snake_case sort keys in generated code. Do **not** use camelCase sort names.
-- `hf_repo_search(sort=...)`:
-  - model / dataset: `created_at`, `downloads`, `last_modified`, `likes`, `trending_score`
-  - space: `created_at`, `last_modified`, `likes`, `trending_score`
-- `hf_user_likes(sort=...)`: `liked_at`, `repo_likes`, `repo_downloads`
-- `hf_user_likes(...)` row keys: `liked_at`, `repo_id`, `repo_type`, `repo_author`, `repo_likes`, `repo_downloads`, `repo_url`
-- `hf_repo_search(advanced=...)` is allowed only when you pass exactly one `repo_type`.
-- `hf_repo_search(advanced=...)` allowed keys:
-  - model: `filter`, `apps`, `gated`, `inference`, `inference_provider`, `model_name`, `trained_dataset`, `pipeline_tag`, `emissions_thresholds`, `expand`, `full`, `cardData`, `fetch_config`
-  - dataset: `filter`, `benchmark`, `dataset_name`, `gated`, `language_creators`, `language`, `multilinguality`, `size_categories`, `task_categories`, `task_ids`, `expand`, `full`
-  - space: `filter`, `datasets`, `models`, `linked`, `expand`, `full`
-- `advanced["expand"]` values are exact strings. Do **not** convert them to snake_case. Use only these values:
-  - model: `author`, `baseModels`, `cardData`, `config`, `createdAt`, `disabled`, `downloads`, `downloadsAllTime`, `evalResults`, `gated`, `gguf`, `inference`, `inferenceProviderMapping`, `lastModified`, `library_name`, `likes`, `mask_token`, `model-index`, `pipeline_tag`, `private`, `resourceGroup`, `safetensors`, `sha`, `siblings`, `spaces`, `tags`, `transformersInfo`, `trendingScore`, `widgetData`, `xetEnabled`, `gitalyUid`
-  - dataset: `author`, `cardData`, `citation`, `createdAt`, `description`, `disabled`, `downloads`, `downloadsAllTime`, `gated`, `lastModified`, `likes`, `paperswithcode_id`, `private`, `resourceGroup`, `sha`, `siblings`, `tags`, `trendingScore`, `xetEnabled`, `gitalyUid`
-  - space: `author`, `cardData`, `createdAt`, `datasets`, `disabled`, `lastModified`, `likes`, `models`, `private`, `resourceGroup`, `runtime`, `sdk`, `sha`, `siblings`, `subdomain`, `tags`, `trendingScore`, `xetEnabled`, `gitalyUid`
-- If a specific expanded field matters to the answer, request it explicitly in `advanced["expand"]`. Do not rely on implicit defaults.
-- `filters` and `where` are **not** interchangeable:
-  - `filters` passes upstream HF filter/tag arguments into the Hub client
-  - `where` filters the normalized rows returned by this runtime
-  - for repo-search questions about upstream fields like `author` or model `pipeline_tag`, push the constraint upstream instead of using `where`
-  - if the user asks for a returned field such as Space runtime state/status, prefer `where` over `filters`
-- For model pipeline-tag questions, prefer:
-  - `hf_repo_search(repo_type="model", advanced={"pipeline_tag": "text-to-image"}, ...)`
-  - **not** `where={"pipeline_tag": "text-to-image"}`
-- For Space runtime state/status questions:
-  - the canonical returned field is `runtime_stage`
-  - friendly wording like "state" or "status" refers to `runtime_stage`
-  - values such as `BUILD_ERROR`, `RUNTIME_ERROR`, `RUNNING`, and `SLEEPING` are runtime stages
-  - plain `"ERROR"` is not a canonical stage value; if the user says "error state", treat that as `BUILD_ERROR` and `RUNTIME_ERROR`
-## Routing guide
-### Summary vs detail
-- Summary helpers are the default for list/search/trending questions: `hf_repo_search(...)`, `hf_trending(...)`, `hf_daily_papers(...)`, `hf_user_likes(...)`, `hf_recent_activity(...)`, `hf_collections_search(...)`, `hf_collection_items(...)`, `hf_org_members(...)`, `hf_user_graph(...)`.
-- Use `hf_repo_details(...)` when the user needs exact repo metadata rather than a cheap summary row.
-- Do **not** invent follow-up detail calls unless the user explicitly needs fields that are not already available in the current helper response.
-### Repo questions
-- Exact `owner/name` details → `hf_repo_details(repo_type="auto", ...)`
-- Search/discovery/list/top repos → `hf_repo_search(...)`
-- True trending requests → `hf_trending(...)`
-- Daily papers → `hf_daily_papers(...)`
-- Repo discussions → `hf_repo_discussions(...)`
-- Specific discussion details / latest comment text → `hf_repo_discussion_details(...)`
-- Users who liked a specific repo → `hf_repo_likers(...)`
-### User questions
-- Profile / overview / "tell me about user X" → `hf_profile_summary(...)`
-- Follower/following **counts** for a user → prefer `hf_profile_summary(...)`
-- Followers / following **lists**, graph samples, and social joins → `hf_user_graph(...)`
-- Repos a user liked → `hf_user_likes(...)`
-- Recent actions / activity feed → `hf_recent_activity(feed_type="user", entity=...)`
-### Organization questions
-- Organization details and counts → `hf_profile_summary(...)`
-- Organization members → `hf_org_members(...)`
-- Organization repos → `hf_repo_search(author="<org>", repo_types=[...])`
-- Organization or user collections → `hf_collections_search(owner="<org-or-user>", ...)`
-- Repos inside a known collection → `hf_collection_items(collection_id=...)`
-### Direction reminders
-- `hf_user_likes(...)` = **user → repos**
-- `hf_repo_likers(...)` = **repo → users**
-- `hf_user_graph(...)` = **user/org → followers/following**
-- `"who follows X"` → `hf_user_graph(username="X", relation="followers", ...)`
-- `"who does X follow"` → `hf_user_graph(username="X", relation="following", ...)`
-- If the author/org is already known, start with `hf_repo_search(author=...)` instead of semantic search.
-- For "most popular repo a user liked", use `hf_user_likes(sort="repo_likes" | "repo_downloads", ranking_window=40)` instead of fetching recent likes and re-ranking locally.
-### Join / intersection guidance
-- For set-intersection questions, prefer **one helper call per side + local set logic**.
-- Example: `"who in the huggingface org follows evalstate"` should use:
-  1. `hf_org_members(organization="huggingface", ...)`
-  2. `hf_user_graph(username="evalstate", relation="followers", ...)`
-  3. intersect `username` locally
-- Example: `"who in the huggingface org does evalstate follow"` should use:
-  1. `hf_org_members(organization="huggingface", ...)`
-  2. `hf_user_graph(username="evalstate", relation="following", ...)`
-  3. intersect `username` locally
-- Do **not** invert follower/following direction when restating the prompt.
-- Do **not** do one graph call per org member for these intersection questions unless you explicitly need a bounded fallback.
-## Canonical row keys
-Use canonical names in generated code and `fields=[...]`.
-- Repo rows: `repo_id`, `repo_type`, `author`, `likes`, `downloads`, `created_at`, `last_modified`, `pipeline_tag`, `num_params`, `library_name`, `description`, `paperswithcode_id`, `sdk`, `models`, `datasets`, `subdomain`, `runtime_stage`, `runtime`, `trending_rank`, `trending_score`, `repo_url`, `tags`
-- Daily paper rows: `paper_id`, `title`, `published_at`, `authors`, `organization`, `repo_id`, `rank`
-- User likes rows: `liked_at`, `repo_id`, `repo_type`, `repo_author`, `repo_likes`, `repo_downloads`, `repo_url`
-- User graph/member rows: `username`, `fullname`, `isPro`, `role`, `type`
-- Activity rows: `event_type`, `repo_id`, `repo_type`, `timestamp`
-- Collection rows: `collection_id`, `slug`, `title`, `owner`, `owner_type`, `description`, `last_updated`, `item_count`
-- `hf_profile_summary(...)["item"]`: `handle`, `entity_type`, `display_name`, `bio`, `description`, `avatar_url`, `website_url`, `twitter_url`, `github_url`, `linkedin_url`, `bluesky_url`, `followers_count`, `following_count`, `likes_count`, `members_count`, `models_count`, `datasets_count`, `spaces_count`, `is_pro`, `likes_sample`, `activity_sample`
-## High-signal usage notes
-- `hf_repo_search(...)` defaults to models. If the user asks for all repos by an author/org, search across `repo_types=["model", "dataset", "space"]`.
-- Summary helpers come first. Use `hf_repo_details(...)` only when the user explicitly needs exact repo metadata.
-- Use `repo_id` as the display label for repos.
-- `hf_repo_search(...)` model rows may already include `num_params`; use that before considering detail hydration.
-- `hf_trending(...)` returns ordered rows with `trending_rank`. Never fabricate `trending_score`.
-- `hf_daily_papers(...)` may omit `repo_id`. Omit unavailable optional fields instead of forcing nulls.
-- Use `hf_profile_summary(...)["item"]` for aggregate counts such as followers, following, models, datasets, and spaces.
-- Use `hf_whoami()` when you need the explicit current username for joins, comparisons, or labeling.
-- For joins, overlap, and ranking, fetch a large enough working set first and compute locally. It is fine for the internal working set to be larger than the final returned output.
-- Avoid per-row hydration unless exact metadata is required and missing from the current helper response.
-- For fan-out tasks, prefer bounded seed sets by default. If the user explicitly asks for exhaustive coverage, do **not** silently cap at a small sample.
-- If exhaustive coverage is not feasible within the call/time budget, return an explicit partial result with `results` and `coverage`. Never present a bounded sample as complete.
-- In raw mode, do **not** create your own top-level `meta`; runtime already owns the outer `meta`.
-- Use `hf_collections_search(...)` to find collections and `hf_collection_items(...)` to list the repos inside a collection.
-## Minimal patterns
-```py
-# Exact repo details
-info = await hf_repo_details(
-    repo_id="black-forest-labs/FLUX.1-dev",
-    repo_type="auto",
-    fields=["repo_id", "repo_type", "author", "pipeline_tag", "library_name", "num_params", "likes", "downloads", "repo_url"],
-)
-item = info["item"] or (info["items"][0] if info["items"] else None)
-return {
-    "repo_id": item["repo_id"],
-    "repo_type": item["repo_type"],
-    "author": item["author"],
-    "pipeline_tag": item.get("pipeline_tag"),
-    "library_name": item.get("library_name"),
-    "num_params": item.get("num_params"),
-    "likes": item.get("likes"),
-    "downloads": item.get("downloads"),
-    "repo_url": item.get("repo_url"),
-}
-# Runtime capability / supported-field introspection
-caps = await hf_runtime_capabilities(section="fields")
-if not caps["ok"]:
-    return caps
-item = caps["item"] or (caps["items"][0] if caps["items"] else None)
-return item["content"]
-# Top trending models with selected fields
-resp = await hf_trending(
-    repo_type="model",
-    limit=5,
-    fields=["repo_id", "likes", "downloads"],
-)
-if not resp["ok"]:
-    return resp
-result = []
-for item in resp["items"]:
-    row = {}
-    for key in ["repo_id", "likes", "downloads"]:
-        if item.get(key) is not None:
-            row[key] = item[key]
-    if row:
-        result.append(row)
-return result
-# Compact profile summary
-summary = await hf_profile_summary(
-    handle="mishig",
-    include=["likes", "activity"],
-    likes_limit=10,
-    activity_limit=10,
-)
-item = summary["item"] or (summary["items"][0] if summary["items"] else None)
-return {
-    "followers_count": item["followers_count"],
-    "following_count": item.get("following_count"),
-    "activity_sample": item.get("activity_sample", []),
-    "likes_sample": item.get("likes_sample", []),
-}
-# Fan-out query with bounded partial coverage metadata
-followers = await hf_user_graph(
-    relation="followers",
-    return_limit=20,
-    fields=["username"],
-)
-if not followers["ok"]:
-    return followers
-result = {}
-processed = 0
-for row in followers["items"]:
-    uname = row.get("username")
-    if not uname:
-        continue
-    likes = await hf_user_likes(
-        username=uname,
-        repo_types=["model"],
-        return_limit=3,
-        fields=["repo_id", "repo_author", "liked_at"],
-    )
-    processed += 1
-    rows = []
-    for item in likes["items"]:
-        liked = {}
-        for key in ["repo_id", "repo_author", "liked_at"]:
-            if item.get(key) is not None:
-                liked[key] = item[key]
-        if liked:
-            rows.append(liked)
-    if rows:
-        result[uname] = rows
-return {
-    "results": result,
-    "coverage": {
-        "partial": bool(followers["meta"].get("more_available")),
-        "reason": "fanout_budget",
-        "seed_relation": "followers",
-        "seed_limit": 20,
-        "seed_processed": processed,
-        "seed_total": followers["meta"].get("total"),
-        "seed_more_available": followers["meta"].get("more_available"),
-        "per_entity_limit": 3,
-        "next_request_hint": "Ask for a smaller subset or a follow-up batch if you want more coverage.",
-    },
-}
-# Popularity-ranked likes with metadata
-likes = await hf_user_likes(
-    username="julien-c",
-    return_limit=1,
-    sort="repo_likes",
-    ranking_window=40,
-    fields=["repo_id", "repo_type", "repo_author", "repo_likes", "repo_url", "liked_at"],
-)
-item = likes["item"] or (likes["items"][0] if likes["items"] else None)
-if item is None:
-    return {"error": "No liked repositories found"}
-repo = {}
-for key in ["repo_id", "repo_type", "repo_author", "repo_likes", "repo_url", "liked_at"]:
-    if item.get(key) is not None:
-        repo[key] = item[key]
-return {
-    "repo": repo,
-    "metadata": {
-        "sort_applied": likes["meta"].get("sort_applied"),
-        "ranking_window": likes["meta"].get("ranking_window"),
-        "ranking_complete": likes["meta"].get("ranking_complete"),
-    },
-}
-# Recent activity with compact snake_case rows
-activity = await hf_recent_activity(
-    feed_type="user",
-    entity="mishig",
-    return_limit=15,
-    fields=["event_type", "repo_id", "repo_type", "timestamp"],
-)
-result = []
-for row in activity["items"]:
-    item = {}
-    for key in ["event_type", "repo_id", "repo_type", "timestamp"]:
-        if row.get(key) is not None:
-            item[key] = row[key]
-    if item:
-        result.append(item)
-return result
-# Repo discussions
-rows = await hf_repo_discussions(
-    repo_type="model",
-    repo_id="Qwen/Qwen3.5-35B-A3B",
-    limit=10,
-)
-return [
-    {
-        "num": row["num"],
-        "title": row["title"],
-        "author": row["author"],
-        "status": row["status"],
-    }
-    for row in rows["items"]
-]
-# Collections owned by an org or user
-collections = await hf_collections_search(
-    owner="Qwen",
-    return_limit=20,
-    fields=["collection_id", "title", "owner", "description", "last_updated", "item_count"],
-)
-return collections["items"]
-# Daily papers via the helper
-papers = await hf_daily_papers(
-    limit=20,
-    fields=["title", "repo_id"],
-)
-return papers["items"]
 ```

+## Monty rules
+- You are writing Python for Monty.
+- Do **not** use imports.
+- All helper calls are async: always use `await`.
+- Use this exact outer shape:
 ```py
 async def solve(query, max_calls):
     ...
 await solve(query, max_calls)
 ```
+- `max_calls` is the total external-call budget for the whole program.
+- Use only documented `hf_*` helpers.
+- If you are unsure about helper names, fields, defaults, or limits, call `hf_runtime_capabilities(...)`.
+- Return plain Python data only: `dict`, `list`, `str`, `int`, `float`, `bool`, or `None`.
+- Do **not** hand-build JSON strings or markdown strings inside `solve(...)` unless the user explicitly asked for prose.
+- Do **not** build your own transport wrapper like `{result: ..., meta: ...}`.
+- If the user says "return only" some fields, return exactly that final shape.
+- If a helper already returns the requested row shape, return `resp["items"]` directly instead of rebuilding it.
+- For current-user prompts (`my`, `me`), try helpers with `username=None` / `handle=None` first.
+- If a current-user helper returns `ok=false`, return that helper response directly.
+## Search rules
+## Parameter notes
+- List helpers use `limit`.
 - `hf_profile_summary(include=...)` supports only `"likes"` and `"activity"`.
+- `hf_user_likes(sort=...)` supports `liked_at`, `repo_likes`, and `repo_downloads`.
+- When the user asks for helper-owned coverage metadata, use `helper_resp["meta"]`.
+- For pro-only follower/member/liker queries, prefer `pro_only=True` instead of filtering on a projected field.
+- `hf_profile_summary(...).item` aggregate counts use exact names like `followers_count` and `following_count`.
+- `hf_user_likes(...)` rows use `repo_likes` / `repo_downloads`, not plain `likes` / `downloads`.
+- `hf_user_graph(...)` and `hf_repo_likers(...)` rows use `is_pro`.
+- `hf_repo_discussions(...)` rows use `num`, `title`, `author`, `status`, `created_at`, and `url`.
+- `hf_user_likes(...)` already returns full normalized like rows by default; omit `fields` unless the user asked for a subset.
+- Unknown `fields` / `where` keys now fail fast. Use only canonical field names.
+- If the user is asking about models, use `hf_models_search(...)`.
+- If the user is asking about datasets, use `hf_datasets_search(...)`.
+- If the user is asking about spaces, use `hf_spaces_search(...)`.
+- Use `hf_repo_search(...)` only for intentionally cross-type search.
+- Ownership phrasing like "what collections does Qwen have", "collections by Qwen", or "collections owned by Qwen" means an owner lookup, so use `hf_collections_search(owner="Qwen")`, not a keyword-only `query="Qwen"` search.
+- Owner/user/org handles may arrive with different casing in the user message; when a handle spelling is uncertain, prefer owner-oriented logic and, if needed, add fallback inside `solve(...)` that broadens to `query=...` and filters owners case-insensitively.
+- Think like `huggingface_hub`: `search`, `filter`, `author`, repo-type-specific upstream params, then `fields`.
+- Push constraints upstream whenever a first-class helper argument exists.
+- `post_filter` is only for filtering normalized rows after fetch.
+- Keep `post_filter` simple:
+  - exact match or `in` for returned fields like `runtime_stage`
+  - `gte` / `lte` only for `downloads` and `likes`
+- Do **not** use `post_filter` for things that already have first-class upstream params like `author`, `pipeline_tag`, `dataset_name`, `language`, `models`, or `datasets`.
+Examples:
+```py
+await hf_models_search(pipeline_tag="text-to-image", limit=10)
+await hf_datasets_search(search="speech", sort="downloads", limit=10)
+await hf_spaces_search(post_filter={"runtime_stage": {"in": ["BUILD_ERROR", "RUNTIME_ERROR"]}})
+await hf_models_search(search="gguf", post_filter={"downloads": {"gte": 1000}})
+await hf_collections_search(owner="Qwen", limit=10)
+```
+Field-only pattern:
+```py
+resp = await hf_models_search(
+    pipeline_tag="text-to-image",
+    fields=["repo_id", "author", "likes", "downloads", "repo_url"],
+    limit=3,
+)
+return resp["items"]
+```
+Coverage pattern:
+```py
+resp = await hf_user_likes(
+    username="julien-c",
+    sort="repo_likes",
+    limit=20,
+    fields=["repo_id", "repo_likes", "repo_url"],
+)
+return {"results": resp["items"][:1], "coverage": resp["meta"]}
+```
+Profile-count pattern:
+```py
+profile = await hf_profile_summary(handle="mishig")
+item = profile["item"] or {}
+return {
+    "followers_count": item.get("followers_count"),
+    "following_count": item.get("following_count"),
+}
+```
+Pro-followers pattern:
+```py
+followers = await hf_user_graph(
+    relation="followers",
+    pro_only=True,
+    limit=20,
+    fields=["username"],
+)
+return followers["items"]
+```
+## Navigation graph
+Use the helper that matches the question type.
+- exact repo details → `hf_repo_details(...)`
+- model search/list/discovery → `hf_models_search(...)`
+- dataset search/list/discovery → `hf_datasets_search(...)`
+- space search/list/discovery → `hf_spaces_search(...)`
+- cross-type repo search → `hf_repo_search(...)`
+- trending repos → `hf_trending(...)`
+- daily papers → `hf_daily_papers(...)`
+- repo discussions → `hf_repo_discussions(...)`
+- specific discussion details → `hf_repo_discussion_details(...)`
+- users who liked one repo → `hf_repo_likers(...)`
+- profile / overview / aggregate counts → `hf_profile_summary(...)`
+- followers / following lists → `hf_user_graph(...)`
+- repos a user liked → `hf_user_likes(...)`
+- recent activity feed → `hf_recent_activity(...)`
+- organization members → `hf_org_members(...)`
+- collections search → `hf_collections_search(...)`
+- items inside a known collection → `hf_collection_items(...)`
+- explicit current username → `hf_whoami()`
+Direction reminders:
+- `hf_user_likes(...)` = user → repos
+- `hf_repo_likers(...)` = repo → users
+- `hf_user_graph(...)` = user/org → followers/following
 ## Helper result shape
 All helpers return:
 ```py
 {
   "ok": bool,
 Rules:
 - `items` is the canonical list field.
+- `item` is just a singleton convenience.
+- `meta` contains helper-owned execution, limit, and coverage info.
+- When helper-owned coverage matters, prefer returning the helper envelope directly.
+## High-signal output rules
+- Prefer compact dict/list outputs over prose when the user asked for fields.
+- Prefer summary helpers before detail hydration.
+- Use canonical snake_case keys in generated code and structured output.
+- Use `repo_id` as the display label for repos.
+- Use `hf_profile_summary(...)['item']` for aggregate counts such as followers, following, models, datasets, and spaces.
+- For joins/intersections/rankings, fetch the needed working set first and compute locally.
+- If the result is partial, use top-level keys `results` and `coverage`.
+## Helper signatures (generated from Python)
+These signatures are exported from the live runtime with `inspect.signature(...)`.
+If prompt prose and signatures disagree, trust these signatures.
 ```py
+await hf_collection_items(collection_id: 'str', repo_types: 'list[str] | None' = None, limit: 'int' = 100, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_collections_search(query: 'str | None' = None, owner: 'str | None' = None, limit: 'int' = 20, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_daily_papers(limit: 'int' = 20, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_datasets_search(search: 'str | None' = None, filter: 'str | list[str] | None' = None, author: 'str | None' = None, benchmark: 'str | bool | None' = None, dataset_name: 'str | None' = None, gated: 'bool | None' = None, language_creators: 'str | list[str] | None' = None, language: 'str | list[str] | None' = None, multilinguality: 'str | list[str] | None' = None, size_categories: 'str | list[str] | None' = None, task_categories: 'str | list[str] | None' = None, task_ids: 'str | list[str] | None' = None, sort: 'str | None' = None, limit: 'int' = 20, expand: 'list[str] | None' = None, full: 'bool | None' = None, fields: 'list[str] | None' = None, post_filter: 'dict[str, Any] | None' = None) -> 'dict[str, Any]'
+await hf_models_search(search: 'str | None' = None, filter: 'str | list[str] | None' = None, author: 'str | None' = None, apps: 'str | list[str] | None' = None, gated: 'bool | None' = None, inference: 'str | None' = None, inference_provider: 'str | list[str] | None' = None, model_name: 'str | None' = None, trained_dataset: 'str | list[str] | None' = None, pipeline_tag: 'str | None' = None, emissions_thresholds: 'tuple[float, float] | None' = None, sort: 'str | None' = None, limit: 'int' = 20, expand: 'list[str] | None' = None, full: 'bool | None' = None, card_data: 'bool' = False, fetch_config: 'bool' = False, fields: 'list[str] | None' = None, post_filter: 'dict[str, Any] | None' = None) -> 'dict[str, Any]'
+await hf_org_members(organization: 'str', limit: 'int | None' = None, scan_limit: 'int | None' = None, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_profile_summary(handle: 'str | None' = None, include: 'list[str] | None' = None, likes_limit: 'int' = 10, activity_limit: 'int' = 10) -> 'dict[str, Any]'
+await hf_recent_activity(feed_type: 'str | None' = None, entity: 'str | None' = None, activity_types: 'list[str] | None' = None, repo_types: 'list[str] | None' = None, limit: 'int | None' = None, max_pages: 'int | None' = None, start_cursor: 'str | None' = None, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_repo_details(repo_id: 'str | None' = None, repo_ids: 'list[str] | None' = None, repo_type: 'str' = 'auto', fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_repo_discussion_details(repo_type: 'str', repo_id: 'str', discussion_num: 'int', fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_repo_discussions(repo_type: 'str', repo_id: 'str', limit: 'int' = 20, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_repo_likers(repo_id: 'str', repo_type: 'str', limit: 'int | None' = None, count_only: 'bool' = False, pro_only: 'bool | None' = None, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_repo_search(search: 'str | None' = None, repo_type: 'str | None' = None, repo_types: 'list[str] | None' = None, filter: 'str | list[str] | None' = None, author: 'str | None' = None, sort: 'str | None' = None, limit: 'int' = 20, fields: 'list[str] | None' = None, post_filter: 'dict[str, Any] | None' = None) -> 'dict[str, Any]'
+await hf_runtime_capabilities(section: 'str | None' = None) -> 'dict[str, Any]'
+await hf_spaces_search(search: 'str | None' = None, filter: 'str | list[str] | None' = None, author: 'str | None' = None, datasets: 'str | list[str] | None' = None, models: 'str | list[str] | None' = None, linked: 'bool' = False, sort: 'str | None' = None, limit: 'int' = 20, expand: 'list[str] | None' = None, full: 'bool | None' = None, fields: 'list[str] | None' = None, post_filter: 'dict[str, Any] | None' = None) -> 'dict[str, Any]'
+await hf_trending(repo_type: 'str' = 'model', limit: 'int' = 20, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_user_graph(username: 'str | None' = None, relation: 'str' = 'followers', limit: 'int | None' = None, scan_limit: 'int | None' = None, count_only: 'bool' = False, pro_only: 'bool | None' = None, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]'
+await hf_user_likes(username: 'str | None' = None, repo_types: 'list[str] | None' = None, limit: 'int | None' = None, scan_limit: 'int | None' = None, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None, sort: 'str | None' = None, ranking_window: 'int | None' = None) -> 'dict[str, Any]'
+await hf_whoami() -> 'dict[str, Any]'
 ```

hf-hub-query.md CHANGED Viewed

@@ -4,7 +4,7 @@ name: hf_hub_query
 model: hf.openai/gpt-oss-120b:sambanova
 use_history: false
 default: true
-description: "Read-only Hugging Face Hub navigator for discovery, lookup, filtering, ranking, counts, field-constrained extraction, and relationship questions across users, orgs, models, datasets, spaces, collections, discussions, daily papers, recent activity, followers/following, likes, and likers. Good for structured raw outputs and compact results. Generated helper calls can explicitly bound return_limit, scan_limit, and max_pages for brevity or broader coverage, and the tool can also be asked about its supported helpers, fields, aliases, defaults, and coverage behavior."
 shell: false
 skills: []
 function_tools:

 model: hf.openai/gpt-oss-120b:sambanova
 use_history: false
 default: true
+description: "Read-only Hugging Face Hub navigator for discovery, lookup, filtering, ranking, counts, field-constrained extraction, and relationship questions across users, orgs, models, datasets, spaces, collections, discussions, daily papers, recent activity, followers/following, likes, and likers. Good for structured raw outputs and compact results. Generated helper calls can explicitly bound limit, scan_limit, and max_pages for brevity or broader coverage, and the tool can also be asked about its supported helpers, canonical fields, defaults, and coverage behavior."
 shell: false
 skills: []
 function_tools:

monty_api/__pycache__/aliases.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/aliases.cpython-313.pyc and b/monty_api/__pycache__/aliases.cpython-313.pyc differ

monty_api/__pycache__/constants.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/constants.cpython-313.pyc and b/monty_api/__pycache__/constants.cpython-313.pyc differ

monty_api/__pycache__/http_runtime.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/http_runtime.cpython-313.pyc and b/monty_api/__pycache__/http_runtime.cpython-313.pyc differ

monty_api/__pycache__/registry.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/registry.cpython-313.pyc and b/monty_api/__pycache__/registry.cpython-313.pyc differ

monty_api/__pycache__/runtime_context.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/runtime_context.cpython-313.pyc and b/monty_api/__pycache__/runtime_context.cpython-313.pyc differ

monty_api/__pycache__/runtime_envelopes.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/runtime_envelopes.cpython-313.pyc and b/monty_api/__pycache__/runtime_envelopes.cpython-313.pyc differ

monty_api/__pycache__/runtime_filtering.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/runtime_filtering.cpython-313.pyc and b/monty_api/__pycache__/runtime_filtering.cpython-313.pyc differ

monty_api/__pycache__/validation.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/__pycache__/validation.cpython-313.pyc and b/monty_api/__pycache__/validation.cpython-313.pyc differ

monty_api/aliases.py CHANGED Viewed

@@ -29,93 +29,3 @@ REPO_SORT_KEYS: dict[str, set[str]] = {
         "trending_score",
     },
 }
-# Alias policy:
-# - canonical names stay canonical
-# - support a small compatibility set for observed prompt/output variants
-# - do not add speculative synonyms unless they appear in prompts, evals, or
-#   upstream payloads we already normalize
-SORT_KEY_ALIASES: dict[str, str] = {
-    "createdat": "created_at",
-    "created_at": "created_at",
-    "created-at": "created_at",
-    "downloads": "downloads",
-    "likes": "likes",
-    "lastmodified": "last_modified",
-    "last_modified": "last_modified",
-    "last-modified": "last_modified",
-    "trendingscore": "trending_score",
-    "trending_score": "trending_score",
-    "trending-score": "trending_score",
-    "trending": "trending_score",
-}
-USER_FIELD_ALIASES: dict[str, str] = {
-    "login": "username",
-    "user": "username",
-    "handle": "username",
-    "name": "fullname",
-    "full_name": "fullname",
-    "is_pro": "isPro",
-    "pro": "isPro",
-}
-ACTOR_FIELD_ALIASES: dict[str, str] = {
-    **USER_FIELD_ALIASES,
-    "entity_type": "type",
-    "user_type": "type",
-}
-REPO_FIELD_ALIASES: dict[str, str] = {
-    "repoid": "repo_id",
-    "repotype": "repo_type",
-    "repourl": "repo_url",
-    "createdat": "created_at",
-    "lastmodified": "last_modified",
-    "pipelinetag": "pipeline_tag",
-    "numparams": "num_params",
-    "trendingrank": "trending_rank",
-    "trendingscore": "trending_score",
-    "libraryname": "library_name",
-    "paperswithcodeid": "paperswithcode_id",
-    "runtimestage": "runtime_stage",
-    "runtimestatus": "runtime_stage",
-}
-COLLECTION_FIELD_ALIASES: dict[str, str] = {
-    "collectionid": "collection_id",
-    "lastupdated": "last_updated",
-    "ownertype": "owner_type",
-    "itemcount": "item_count",
-    "author": "owner",
-}
-DAILY_PAPER_FIELD_ALIASES: dict[str, str] = {
-    "paperid": "paper_id",
-    "publishedat": "published_at",
-    "submittedondailyat": "submitted_on_daily_at",
-    "submittedby": "submitted_by",
-    "discussionid": "discussion_id",
-    "githubrepo": "github_repo_url",
-    "githubstars": "github_stars",
-    "projectpage": "project_page_url",
-    "numcomments": "num_comments",
-    "isauthorparticipating": "is_author_participating",
-    "repoid": "repo_id",
-}
-USER_LIKES_FIELD_ALIASES: dict[str, str] = {
-    "likedat": "liked_at",
-    "repoid": "repo_id",
-    "repotype": "repo_type",
-    "repoauthor": "repo_author",
-    "repolikes": "repo_likes",
-    "repodownloads": "repo_downloads",
-}
-ACTIVITY_FIELD_ALIASES: dict[str, str] = {
-    "time": "timestamp",
-    "type": "event_type",
-    "repoid": "repo_id",
-    "repotype": "repo_type",
-}

         "trending_score",
     },
 }

monty_api/constants.py CHANGED Viewed

@@ -79,7 +79,7 @@ USER_CANONICAL_FIELDS: tuple[str, ...] = (
     "username",
     "fullname",
     "bio",
-    "websiteUrl",
     "twitter",
     "github",
     "linkedin",
@@ -87,7 +87,7 @@ USER_CANONICAL_FIELDS: tuple[str, ...] = (
     "followers",
     "following",
     "likes",
-    "isPro",
 )
 PROFILE_CANONICAL_FIELDS: tuple[str, ...] = (
@@ -121,11 +121,48 @@ PROFILE_CANONICAL_FIELDS: tuple[str, ...] = (
 ACTOR_CANONICAL_FIELDS: tuple[str, ...] = (
     "username",
     "fullname",
-    "isPro",
     "role",
     "type",
 )
 ACTIVITY_CANONICAL_FIELDS: tuple[str, ...] = (
     "event_type",
     "repo_id",

     "username",
     "fullname",
     "bio",
+    "website_url",
     "twitter",
     "github",
     "linkedin",
     "followers",
     "following",
     "likes",
+    "is_pro",
 )
 PROFILE_CANONICAL_FIELDS: tuple[str, ...] = (
 ACTOR_CANONICAL_FIELDS: tuple[str, ...] = (
     "username",
     "fullname",
+    "is_pro",
     "role",
     "type",
 )
+USER_LIKES_CANONICAL_FIELDS: tuple[str, ...] = (
+    "liked_at",
+    "repo_id",
+    "repo_type",
+    "repo_author",
+    "repo_likes",
+    "repo_downloads",
+    "repo_url",
+)
+DISCUSSION_CANONICAL_FIELDS: tuple[str, ...] = (
+    "num",
+    "repo_id",
+    "repo_type",
+    "title",
+    "author",
+    "created_at",
+    "status",
+    "url",
+)
+DISCUSSION_DETAIL_CANONICAL_FIELDS: tuple[str, ...] = (
+    "num",
+    "repo_id",
+    "repo_type",
+    "title",
+    "author",
+    "created_at",
+    "status",
+    "url",
+    "comment_count",
+    "latest_comment_author",
+    "latest_comment_created_at",
+    "latest_comment_text",
+    "latest_comment_html",
+)
 ACTIVITY_CANONICAL_FIELDS: tuple[str, ...] = (
     "event_type",
     "repo_id",

monty_api/helpers/__pycache__/activity.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/helpers/__pycache__/activity.cpython-313.pyc and b/monty_api/helpers/__pycache__/activity.cpython-313.pyc differ

monty_api/helpers/__pycache__/collections.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/helpers/__pycache__/collections.cpython-313.pyc and b/monty_api/helpers/__pycache__/collections.cpython-313.pyc differ

monty_api/helpers/__pycache__/introspection.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/helpers/__pycache__/introspection.cpython-313.pyc and b/monty_api/helpers/__pycache__/introspection.cpython-313.pyc differ

monty_api/helpers/__pycache__/profiles.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/helpers/__pycache__/profiles.cpython-313.pyc and b/monty_api/helpers/__pycache__/profiles.cpython-313.pyc differ

monty_api/helpers/__pycache__/repos.cpython-313.pyc CHANGED Viewed

Binary files a/monty_api/helpers/__pycache__/repos.cpython-313.pyc and b/monty_api/helpers/__pycache__/repos.cpython-313.pyc differ

monty_api/helpers/activity.py CHANGED Viewed

@@ -4,8 +4,8 @@ from __future__ import annotations
 from functools import partial
 from typing import Any, Callable
-from ..aliases import ACTIVITY_FIELD_ALIASES
 from ..constants import (
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     RECENT_ACTIVITY_PAGE_SIZE,
     RECENT_ACTIVITY_SCAN_MAX_PAGES,
@@ -19,7 +19,7 @@ async def hf_recent_activity(
     entity: str | None = None,
     activity_types: list[str] | None = None,
     repo_types: list[str] | None = None,
-    return_limit: int | None = None,
     max_pages: int | None = None,
     start_cursor: str | None = None,
     count_only: bool = False,
@@ -27,7 +27,7 @@ async def hf_recent_activity(
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_recent_activity", "default_return", 100)
     page_cap = ctx._policy_int(
         "hf_recent_activity", "page_limit", RECENT_ACTIVITY_PAGE_SIZE
     )
@@ -56,12 +56,12 @@ async def hf_recent_activity(
             error="entity is required",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
-        return_limit=return_limit,
         count_only=count_only,
-        default_return=default_return,
-        max_return=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     )
-    ret_lim = int(limit_plan["applied_return_limit"])
     page_lim = page_cap
     pages_lim = ctx._clamp_int(
         requested_max_pages, default=pages_cap, minimum=1, maximum=pages_cap
@@ -85,8 +85,17 @@ async def hf_recent_activity(
     pages = 0
     exhausted_feed = False
     stopped_for_budget = False
-    normalized_where = ctx._normalize_where(where, aliases=ACTIVITY_FIELD_ALIASES)
-    while pages < pages_lim and (ret_lim == 0 or len(items) < ret_lim):
         if ctx._budget_remaining() <= 0:
             stopped_for_budget = True
             break
@@ -147,15 +156,22 @@ async def hf_recent_activity(
             if not ctx._item_matches_where(item, normalized_where):
                 continue
             matched += 1
-            if len(items) < ret_lim:
                 items.append(item)
         if not next_cursor:
             exhausted_feed = True
             break
-    items = ctx._project_activity_items(items, fields)
     exact_count = exhausted_feed and (not stopped_for_budget)
     sample_complete = (
-        exact_count and ret_lim >= matched and (not count_only or matched == 0)
     )
     page_limit_hit = (
         next_cursor is not None and pages >= pages_lim and (not exhausted_feed)

 from functools import partial
 from typing import Any, Callable
 from ..constants import (
+    ACTIVITY_CANONICAL_FIELDS,
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     RECENT_ACTIVITY_PAGE_SIZE,
     RECENT_ACTIVITY_SCAN_MAX_PAGES,
     entity: str | None = None,
     activity_types: list[str] | None = None,
     repo_types: list[str] | None = None,
+    limit: int | None = None,
     max_pages: int | None = None,
     start_cursor: str | None = None,
     count_only: bool = False,
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_recent_activity", "default_limit", 100)
     page_cap = ctx._policy_int(
         "hf_recent_activity", "page_limit", RECENT_ACTIVITY_PAGE_SIZE
     )
             error="entity is required",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
+        limit=limit,
         count_only=count_only,
+        default_limit=default_limit,
+        max_limit=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     )
+    applied_limit = int(limit_plan["applied_limit"])
     page_lim = page_cap
     pages_lim = ctx._clamp_int(
         requested_max_pages, default=pages_cap, minimum=1, maximum=pages_cap
     pages = 0
     exhausted_feed = False
     stopped_for_budget = False
+    try:
+        normalized_where = ctx._normalize_where(
+            where, allowed_fields=ACTIVITY_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/recent-activity",
+            error=exc,
+        )
+    while pages < pages_lim and (applied_limit == 0 or len(items) < applied_limit):
         if ctx._budget_remaining() <= 0:
             stopped_for_budget = True
             break
             if not ctx._item_matches_where(item, normalized_where):
                 continue
             matched += 1
+            if len(items) < applied_limit:
                 items.append(item)
         if not next_cursor:
             exhausted_feed = True
             break
+    try:
+        items = ctx._project_activity_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/recent-activity",
+            error=exc,
+        )
     exact_count = exhausted_feed and (not stopped_for_budget)
     sample_complete = (
+        exact_count and applied_limit >= matched and (not count_only or matched == 0)
     )
     page_limit_hit = (
         next_cursor is not None and pages >= pages_lim and (not exhausted_feed)

monty_api/helpers/collections.py CHANGED Viewed

@@ -4,8 +4,11 @@ from __future__ import annotations
 from functools import partial
 from typing import Any, Callable
-from ..aliases import COLLECTION_FIELD_ALIASES, REPO_FIELD_ALIASES
-from ..constants import OUTPUT_ITEMS_TRUNCATION_LIMIT
 from ..context_types import HelperRuntimeContext
@@ -13,25 +16,29 @@ async def hf_collections_search(
     ctx: HelperRuntimeContext,
     query: str | None = None,
     owner: str | None = None,
-    return_limit: int = 20,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_collections_search", "default_return", 20)
-    max_return = ctx._policy_int(
-        "hf_collections_search", "max_return", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
     if count_only:
-        return_limit = 0
-    lim = ctx._clamp_int(
-        return_limit, default=default_return, minimum=0, maximum=max_return
     )
     owner_clean = str(owner or "").strip() or None
-    fetch_lim = max_return if lim == 0 or owner_clean else lim
     if owner_clean:
-        fetch_lim = min(fetch_lim, 100)
     term = str(query or "").strip()
     if not term and owner_clean:
         term = owner_clean
@@ -41,7 +48,7 @@ async def hf_collections_search(
             source="/api/collections",
             error="query or owner is required",
         )
-    params: dict[str, Any] = {"limit": fetch_lim}
     if term:
         params["q"] = term
     if owner_clean:
@@ -54,8 +61,43 @@ async def hf_collections_search(
             error=resp.get("error") or "collections fetch failed",
         )
     payload = resp.get("data") if isinstance(resp.get("data"), list) else []
     items: list[dict[str, Any]] = []
-    for row in payload[:fetch_lim]:
         if not isinstance(row, dict):
             continue
         row_owner = ctx._author_from_any(row.get("owner")) or ctx._author_from_any(
@@ -67,7 +109,9 @@ async def hf_collections_search(
             and "/" in str(row.get("slug"))
         ):
             row_owner = str(row.get("slug")).split("/", 1)[0]
-        if owner_clean is not None and row_owner != owner_clean:
             continue
         owner_payload = row.get("owner") if isinstance(row.get("owner"), dict) else {}
         collection_items = (
@@ -89,12 +133,29 @@ async def hf_collections_search(
                 "item_count": len(collection_items),
             }
         )
-    items = ctx._apply_where(items, where, aliases=COLLECTION_FIELD_ALIASES)
     total_matched = len(items)
-    items = items[:lim]
-    items = ctx._project_collection_items(items, fields)
     truncated = (
-        lim > 0 and total_matched > lim or (lim == 0 and len(payload) >= fetch_lim)
     )
     return ctx._helper_success(
         start_calls=start_calls,
@@ -110,6 +171,7 @@ async def hf_collections_search(
         complete=not truncated,
         query=term,
         owner=owner_clean,
     )
@@ -117,15 +179,15 @@ async def hf_collection_items(
     ctx: HelperRuntimeContext,
     collection_id: str,
     repo_types: list[str] | None = None,
-    return_limit: int = 100,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_collection_items", "default_return", 100)
-    max_return = ctx._policy_int(
-        "hf_collection_items", "max_return", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
     cid = str(collection_id or "").strip()
     if not cid:
@@ -135,9 +197,12 @@ async def hf_collection_items(
             error="collection_id is required",
         )
     if count_only:
-        return_limit = 0
-    lim = ctx._clamp_int(
-        return_limit, default=default_return, minimum=0, maximum=max_return
     )
     allowed_repo_types: set[str] | None = None
     try:
@@ -180,7 +245,17 @@ async def hf_collection_items(
     )
     if owner is None and "/" in cid:
         owner = cid.split("/", 1)[0]
-    normalized_where = ctx._normalize_where(where, aliases=REPO_FIELD_ALIASES)
     normalized: list[dict[str, Any]] = []
     for row in raw_items:
         if not isinstance(row, dict):
@@ -195,9 +270,17 @@ async def hf_collection_items(
             continue
         normalized.append(item)
     total_matched = len(normalized)
-    items = [] if count_only else normalized[:lim]
-    items = ctx._project_repo_items(items, fields)
-    truncated = lim > 0 and total_matched > lim
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,

 from functools import partial
 from typing import Any, Callable
+from ..constants import (
+    COLLECTION_CANONICAL_FIELDS,
+    OUTPUT_ITEMS_TRUNCATION_LIMIT,
+    REPO_CANONICAL_FIELDS,
+)
 from ..context_types import HelperRuntimeContext
     ctx: HelperRuntimeContext,
     query: str | None = None,
     owner: str | None = None,
+    limit: int = 20,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_collections_search", "default_limit", 20)
+    max_limit = ctx._policy_int(
+        "hf_collections_search", "max_limit", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
     if count_only:
+        limit = 0
+    applied_limit = ctx._clamp_int(
+        limit,
+        default=default_limit,
+        minimum=0,
+        maximum=max_limit,
     )
     owner_clean = str(owner or "").strip() or None
+    owner_casefold = owner_clean.casefold() if owner_clean is not None else None
+    fetch_limit = max_limit if applied_limit == 0 or owner_clean else applied_limit
     if owner_clean:
+        fetch_limit = min(fetch_limit, 100)
     term = str(query or "").strip()
     if not term and owner_clean:
         term = owner_clean
             source="/api/collections",
             error="query or owner is required",
         )
+    params: dict[str, Any] = {"limit": fetch_limit}
     if term:
         params["q"] = term
     if owner_clean:
             error=resp.get("error") or "collections fetch failed",
         )
     payload = resp.get("data") if isinstance(resp.get("data"), list) else []
+    def _row_owner_matches_owner(row: Any) -> bool:
+        if owner_casefold is None or not isinstance(row, dict):
+            return owner_casefold is None
+        row_owner = ctx._author_from_any(row.get("owner")) or ctx._author_from_any(
+            row.get("ownerData")
+        )
+        if (
+            not row_owner
+            and isinstance(row.get("slug"), str)
+            and "/" in str(row.get("slug"))
+        ):
+            row_owner = str(row.get("slug")).split("/", 1)[0]
+        if not isinstance(row_owner, str) or not row_owner:
+            return False
+        return row_owner.casefold() == owner_casefold
+    owner_fallback_used = False
+    if owner_casefold is not None and not any(
+        _row_owner_matches_owner(row) for row in payload
+    ):
+        fallback_params: dict[str, Any] = {"limit": fetch_limit}
+        if term:
+            fallback_params["q"] = term
+        fallback_resp = ctx._host_raw_call("/api/collections", params=fallback_params)
+        if fallback_resp.get("ok"):
+            fallback_payload = (
+                fallback_resp.get("data")
+                if isinstance(fallback_resp.get("data"), list)
+                else []
+            )
+            if any(_row_owner_matches_owner(row) for row in fallback_payload):
+                payload = fallback_payload
+                owner_fallback_used = True
     items: list[dict[str, Any]] = []
+    for row in payload[:fetch_limit]:
         if not isinstance(row, dict):
             continue
         row_owner = ctx._author_from_any(row.get("owner")) or ctx._author_from_any(
             and "/" in str(row.get("slug"))
         ):
             row_owner = str(row.get("slug")).split("/", 1)[0]
+        if owner_casefold is not None and (
+            not isinstance(row_owner, str) or row_owner.casefold() != owner_casefold
+        ):
             continue
         owner_payload = row.get("owner") if isinstance(row.get("owner"), dict) else {}
         collection_items = (
                 "item_count": len(collection_items),
             }
         )
+    try:
+        items = ctx._apply_where(
+            items, where, allowed_fields=COLLECTION_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/collections",
+            error=exc,
+        )
     total_matched = len(items)
+    items = items[:applied_limit]
+    try:
+        items = ctx._project_collection_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/collections",
+            error=exc,
+        )
     truncated = (
+        applied_limit > 0 and total_matched > applied_limit
+        or (applied_limit == 0 and len(payload) >= fetch_limit)
     )
     return ctx._helper_success(
         start_calls=start_calls,
         complete=not truncated,
         query=term,
         owner=owner_clean,
+        owner_case_insensitive_fallback=owner_fallback_used,
     )
     ctx: HelperRuntimeContext,
     collection_id: str,
     repo_types: list[str] | None = None,
+    limit: int = 100,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_collection_items", "default_limit", 100)
+    max_limit = ctx._policy_int(
+        "hf_collection_items", "max_limit", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
     cid = str(collection_id or "").strip()
     if not cid:
             error="collection_id is required",
         )
     if count_only:
+        limit = 0
+    applied_limit = ctx._clamp_int(
+        limit,
+        default=default_limit,
+        minimum=0,
+        maximum=max_limit,
     )
     allowed_repo_types: set[str] | None = None
     try:
     )
     if owner is None and "/" in cid:
         owner = cid.split("/", 1)[0]
+    try:
+        normalized_where = ctx._normalize_where(
+            where, allowed_fields=REPO_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            collection_id=cid,
+        )
     normalized: list[dict[str, Any]] = []
     for row in raw_items:
         if not isinstance(row, dict):
             continue
         normalized.append(item)
     total_matched = len(normalized)
+    items = [] if count_only else normalized[:applied_limit]
+    try:
+        items = ctx._project_repo_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            collection_id=cid,
+        )
+    truncated = applied_limit > 0 and total_matched > applied_limit
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,

monty_api/helpers/introspection.py CHANGED Viewed

@@ -5,22 +5,14 @@ import inspect
 from functools import partial
 from typing import Any, Callable
-from ..aliases import (
-    ACTIVITY_FIELD_ALIASES,
-    ACTOR_FIELD_ALIASES,
-    COLLECTION_FIELD_ALIASES,
-    DAILY_PAPER_FIELD_ALIASES,
-    REPO_FIELD_ALIASES,
-    REPO_SORT_KEYS,
-    SORT_KEY_ALIASES,
-    USER_FIELD_ALIASES,
-    USER_LIKES_FIELD_ALIASES,
-)
 from ..constants import (
     ACTIVITY_CANONICAL_FIELDS,
     ACTOR_CANONICAL_FIELDS,
     COLLECTION_CANONICAL_FIELDS,
     DAILY_PAPER_CANONICAL_FIELDS,
     DEFAULT_MAX_CALLS,
     DEFAULT_TIMEOUT_SEC,
     GRAPH_SCAN_LIMIT_CAP,
@@ -32,6 +24,7 @@ from ..constants import (
     REPO_CANONICAL_FIELDS,
     TRENDING_ENDPOINT_MAX_LIMIT,
     USER_CANONICAL_FIELDS,
 )
 from ..context_types import HelperRuntimeContext
 from ..registry import (
@@ -117,19 +110,12 @@ async def hf_runtime_capabilities(
             "repo": list(REPO_CANONICAL_FIELDS),
             "user": list(USER_CANONICAL_FIELDS),
             "actor": list(ACTOR_CANONICAL_FIELDS),
             "activity": list(ACTIVITY_CANONICAL_FIELDS),
             "collection": list(COLLECTION_CANONICAL_FIELDS),
             "daily_paper": list(DAILY_PAPER_CANONICAL_FIELDS),
-        },
-        "aliases": {
-            "repo": dict(sorted(REPO_FIELD_ALIASES.items())),
-            "user": dict(sorted(USER_FIELD_ALIASES.items())),
-            "actor": dict(sorted(ACTOR_FIELD_ALIASES.items())),
-            "user_likes": dict(sorted(USER_LIKES_FIELD_ALIASES.items())),
-            "activity": dict(sorted(ACTIVITY_FIELD_ALIASES.items())),
-            "collection": dict(sorted(COLLECTION_FIELD_ALIASES.items())),
-            "daily_paper": dict(sorted(DAILY_PAPER_FIELD_ALIASES.items())),
-            "sort_keys": dict(sorted(SORT_KEY_ALIASES.items())),
         },
         "helper_defaults": {
             helper_name: dict(sorted(metadata.items()))
@@ -154,23 +140,131 @@ async def hf_runtime_capabilities(
             ],
         },
         "repo_search": {
             "parameter_contract": {
-                "filters": {
-                    "meaning": "Upstream Hugging Face repo/tag filter arguments passed into the Hub client.",
-                    "not_for": [
-                        "arbitrary normalized row fields",
-                        "local-only derived/runtime fields",
-                    ],
                 },
-                "where": {
-                    "meaning": "Local predicate applied to normalized returned rows after runtime normalization.",
-                    "uses_field_aliases": True,
                 },
                 "fields": {
                     "meaning": "Select which normalized row fields are returned to the caller.",
-                    "uses_field_aliases": True,
                 },
             },
             "sort_keys": {
                 repo_type: sorted(keys)
                 for repo_type, keys in sorted(REPO_SORT_KEYS.items())
@@ -179,37 +273,15 @@ async def hf_runtime_capabilities(
                 repo_type: sorted(args)
                 for repo_type, args in sorted(REPO_SEARCH_EXTRA_ARGS.items())
             },
-            "expand_values": {
-                repo_type: list(values)
-                for repo_type, values in sorted(REPO_SEARCH_ALLOWED_EXPAND.items())
-            },
             "space_runtime_contract": {
                 "returned_field": "runtime_stage",
                 "full_runtime_field": "runtime",
-                "preferred_filter_channel": "where",
-                "not_recommended_filter_channel": "filters",
-                "error_family_values": ["BUILD_ERROR", "RUNTIME_ERROR"],
-                "example": {
-                    "correct": {
-                        "repo_type": "space",
-                        "where": {
-                            "runtime_stage": {
-                                "in": ["BUILD_ERROR", "RUNTIME_ERROR"]
-                            }
-                        },
-                        "fields": [
-                            "repo_id",
-                            "author",
-                            "runtime_stage",
-                            "repo_url",
-                        ],
-                    },
-                    "incorrect": {
-                        "repo_type": "space",
-                        "filters": ["state:ERROR"],
-                        "fields": ["repo_id", "author", "state", "repo_url"],
-                    },
-                },
             },
         },
     }

 from functools import partial
 from typing import Any, Callable
+from ..aliases import REPO_SORT_KEYS
 from ..constants import (
     ACTIVITY_CANONICAL_FIELDS,
     ACTOR_CANONICAL_FIELDS,
     COLLECTION_CANONICAL_FIELDS,
     DAILY_PAPER_CANONICAL_FIELDS,
+    DISCUSSION_CANONICAL_FIELDS,
+    DISCUSSION_DETAIL_CANONICAL_FIELDS,
     DEFAULT_MAX_CALLS,
     DEFAULT_TIMEOUT_SEC,
     GRAPH_SCAN_LIMIT_CAP,
     REPO_CANONICAL_FIELDS,
     TRENDING_ENDPOINT_MAX_LIMIT,
     USER_CANONICAL_FIELDS,
+    USER_LIKES_CANONICAL_FIELDS,
 )
 from ..context_types import HelperRuntimeContext
 from ..registry import (
             "repo": list(REPO_CANONICAL_FIELDS),
             "user": list(USER_CANONICAL_FIELDS),
             "actor": list(ACTOR_CANONICAL_FIELDS),
+            "user_likes": list(USER_LIKES_CANONICAL_FIELDS),
             "activity": list(ACTIVITY_CANONICAL_FIELDS),
             "collection": list(COLLECTION_CANONICAL_FIELDS),
             "daily_paper": list(DAILY_PAPER_CANONICAL_FIELDS),
+            "discussion": list(DISCUSSION_CANONICAL_FIELDS),
+            "discussion_detail": list(DISCUSSION_DETAIL_CANONICAL_FIELDS),
         },
         "helper_defaults": {
             helper_name: dict(sorted(metadata.items()))
             ],
         },
         "repo_search": {
+            "helper_selection": {
+                "preferred_rule": (
+                    "Prefer hf_models_search for model queries, hf_datasets_search for "
+                    "dataset queries, and hf_spaces_search for space queries. Use "
+                    "hf_repo_search only for intentionally cross-type search."
+                ),
+                "model": "hf_models_search",
+                "dataset": "hf_datasets_search",
+                "space": "hf_spaces_search",
+                "cross_type": "hf_repo_search",
+            },
+            "can_do": [
+                "search models",
+                "search datasets",
+                "search spaces",
+                "search across multiple repo types",
+                "project selected fields",
+                "apply local post-fetch row filtering",
+            ],
             "parameter_contract": {
+                "search": {
+                    "meaning": "Upstream Hugging Face search text.",
                 },
+                "filter": {
+                    "meaning": (
+                        "Upstream Hugging Face filter/tag argument passed directly into "
+                        "the Hub client."
+                    ),
+                },
+                "post_filter": {
+                    "meaning": (
+                        "Local predicate applied after the rows are fetched and normalized."
+                    ),
+                    "recommended_shapes": [
+                        {"runtime_stage": "RUNNING"},
+                        {"runtime_stage": {"in": ["BUILD_ERROR", "RUNTIME_ERROR"]}},
+                        {"downloads": {"gte": 1000}},
+                        {"likes": {"lte": 5000}},
+                    ],
+                    "prefer_for": [
+                        "normalized returned fields such as runtime_stage",
+                        "downloads / likes thresholds after a broad search",
+                    ],
+                    "avoid_when": [
+                        "author is already a first-class helper argument",
+                        "pipeline_tag is already a first-class model-search argument",
+                        "dataset_name, language, task_ids, apps, models, or datasets already have first-class helper args",
+                    ],
                 },
                 "fields": {
                     "meaning": "Select which normalized row fields are returned to the caller.",
+                    "canonical_only": True,
                 },
             },
+            "repo_type_specific_helpers": {
+                "model": {
+                    "helper": "hf_models_search",
+                    "preferred_params": [
+                        "search",
+                        "filter",
+                        "author",
+                        "pipeline_tag",
+                        "sort",
+                        "limit",
+                        "expand",
+                        "fields",
+                        "post_filter",
+                    ],
+                    "expand_values": list(REPO_SEARCH_ALLOWED_EXPAND["model"]),
+                },
+                "dataset": {
+                    "helper": "hf_datasets_search",
+                    "preferred_params": [
+                        "search",
+                        "filter",
+                        "author",
+                        "dataset_name",
+                        "language",
+                        "task_categories",
+                        "task_ids",
+                        "sort",
+                        "limit",
+                        "expand",
+                        "fields",
+                        "post_filter",
+                    ],
+                    "expand_values": list(REPO_SEARCH_ALLOWED_EXPAND["dataset"]),
+                },
+                "space": {
+                    "helper": "hf_spaces_search",
+                    "preferred_params": [
+                        "search",
+                        "filter",
+                        "author",
+                        "datasets",
+                        "models",
+                        "linked",
+                        "sort",
+                        "limit",
+                        "expand",
+                        "fields",
+                        "post_filter",
+                    ],
+                    "expand_values": list(REPO_SEARCH_ALLOWED_EXPAND["space"]),
+                },
+            },
+            "generic_helper": {
+                "helper": "hf_repo_search",
+                "use_for": "Intentionally cross-type search only.",
+                "supports": [
+                    "search",
+                    "repo_type",
+                    "repo_types",
+                    "filter",
+                    "author",
+                    "sort",
+                    "limit",
+                    "fields",
+                    "post_filter",
+                ],
+                "does_not_support": [
+                    "repo-type-specific knobs such as pipeline_tag or dataset_name",
+                    "nested advanced routing",
+                ],
+            },
             "sort_keys": {
                 repo_type: sorted(keys)
                 for repo_type, keys in sorted(REPO_SORT_KEYS.items())
                 repo_type: sorted(args)
                 for repo_type, args in sorted(REPO_SEARCH_EXTRA_ARGS.items())
             },
             "space_runtime_contract": {
                 "returned_field": "runtime_stage",
                 "full_runtime_field": "runtime",
+                "preferred_filter_channel": "post_filter",
+                "note": (
+                    "Treat runtime_stage like any other returned field: use exact values "
+                    "or an 'in' list in post_filter."
+                ),
+                "common_values": ["BUILD_ERROR", "RUNTIME_ERROR", "RUNNING", "SLEEPING"],
             },
         },
     }

monty_api/helpers/profiles.py CHANGED Viewed

@@ -5,11 +5,8 @@ from itertools import islice
 import re
 from typing import Any, Callable
 from ..context_types import HelperRuntimeContext
-from ..aliases import (
-    ACTOR_FIELD_ALIASES,
-    USER_FIELD_ALIASES,
-)
 from ..constants import (
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     GRAPH_SCAN_LIMIT_CAP,
     OUTPUT_ITEMS_TRUNCATION_LIMIT,
@@ -74,7 +71,7 @@ async def hf_whoami(ctx: HelperRuntimeContext) -> dict[str, Any]:
     item = {
         "username": username,
         "fullname": payload.get("fullname"),
-        "isPro": payload.get("isPro"),
     }
     items = [item] if isinstance(username, str) and username else []
     return ctx._helper_success(
@@ -148,16 +145,16 @@ async def _hf_user_overview(ctx: HelperRuntimeContext, username: str) -> dict[st
         "username": obj.username or u,
         "fullname": obj.fullname,
         "bio": getattr(obj, "details", None),
-        "avatarUrl": obj.avatar_url,
-        "websiteUrl": getattr(obj, "websiteUrl", None),
         "twitter": _social_url("twitter", twitter_handle),
         "github": _social_url("github", github_handle),
         "linkedin": _social_url("linkedin", linkedin_handle),
         "bluesky": _social_url("bluesky", bluesky_handle),
-        "twitterHandle": twitter_handle,
-        "githubHandle": github_handle,
-        "linkedinHandle": linkedin_handle,
-        "blueskyHandle": bluesky_handle,
         "followers": ctx._as_int(obj.num_followers),
         "following": ctx._as_int(obj.num_following),
         "likes": ctx._as_int(obj.num_likes),
@@ -168,7 +165,7 @@ async def _hf_user_overview(ctx: HelperRuntimeContext, username: str) -> dict[st
         "papers": ctx._as_int(getattr(obj, "num_papers", None)),
         "upvotes": ctx._as_int(getattr(obj, "num_upvotes", None)),
         "orgs": org_names,
-        "isPro": obj.is_pro,
     }
     return ctx._helper_success(
         start_calls=start_calls,
@@ -202,10 +199,10 @@ async def _hf_org_overview(
         return ctx._helper_error(start_calls=start_calls, source=endpoint, error=e)
     item = {
         "organization": obj.name or org,
-        "displayName": obj.fullname,
-        "avatarUrl": obj.avatar_url,
         "description": obj.details,
-        "websiteUrl": getattr(obj, "websiteUrl", None),
         "followers": ctx._as_int(obj.num_followers),
         "members": ctx._as_int(obj.num_users),
         "models": ctx._as_int(getattr(obj, "num_models", None)),
@@ -226,7 +223,7 @@ async def _hf_org_overview(
 async def hf_org_members(
     ctx: HelperRuntimeContext,
     organization: str,
-    return_limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
@@ -240,17 +237,17 @@ async def hf_org_members(
             source="/api/organizations/<o>/members",
             error="organization is required",
         )
-    default_return = ctx._policy_int("hf_org_members", "default_return", 100)
     scan_cap = ctx._policy_int("hf_org_members", "scan_max", GRAPH_SCAN_LIMIT_CAP)
     limit_plan = ctx._resolve_exhaustive_limits(
-        return_limit=return_limit,
         count_only=count_only,
-        default_return=default_return,
-        max_return=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
-    ret_lim = int(limit_plan["applied_return_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
     has_where = isinstance(where, dict) and bool(where)
     overview_total: int | None = None
@@ -299,11 +296,21 @@ async def hf_org_members(
         item = {
             "username": handle,
             "fullname": getattr(row, "fullname", None),
-            "isPro": getattr(row, "is_pro", None),
             "role": getattr(row, "role", None),
         }
         normalized.append(item)
-    normalized = ctx._apply_where(normalized, where, aliases=ACTOR_FIELD_ALIASES)
     observed_total = len(rows)
     scan_exhaustive = observed_total < scan_lim
     overview_list_mismatch = (
@@ -324,14 +331,14 @@ async def hf_org_members(
         total = observed_total
         total_matched = observed_total
     total_available = overview_total if overview_total is not None else observed_total
-    items = normalized[:ret_lim]
     scan_limit_hit = not exact_count and observed_total >= scan_lim
     count_source = (
         "overview" if overview_total is not None and (not has_where) else "scan"
     )
     sample_complete = (
         exact_count
-        and len(normalized) <= ret_lim
         and (not count_only or len(normalized) == 0)
     )
     more_available = ctx._derive_more_available(
@@ -342,7 +349,15 @@ async def hf_org_members(
     )
     if not exact_count and scan_limit_hit:
         more_available = "unknown" if has_where else True
-    items = ctx._project_user_items(items, fields)
     meta = ctx._build_exhaustive_result_meta(
         base_meta={
             "scanned": observed_total,
@@ -375,7 +390,7 @@ async def _user_graph_helper(
     kind: str,
     username: str,
     pro_only: bool | None,
-    return_limit: int | None,
     scan_limit: int | None,
     count_only: bool,
     where: dict[str, Any] | None,
@@ -384,10 +399,10 @@ async def _user_graph_helper(
     helper_name: str,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int(helper_name, "default_return", 100)
     scan_cap = ctx._policy_int(helper_name, "scan_max", GRAPH_SCAN_LIMIT_CAP)
-    max_return = ctx._policy_int(
-        helper_name, "max_return", EXHAUSTIVE_HELPER_RETURN_HARD_CAP
     )
     u = str(username or "").strip()
     if not u:
@@ -397,14 +412,14 @@ async def _user_graph_helper(
             error="username is required",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
-        return_limit=return_limit,
         count_only=count_only,
-        default_return=default_return,
-        max_return=max_return,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
-    ret_lim = int(limit_plan["applied_return_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
     has_where = isinstance(where, dict) and bool(where)
     filtered = pro_only is not None or has_where
@@ -509,14 +524,28 @@ async def _user_graph_helper(
         item = {
             "username": handle,
             "fullname": getattr(row, "fullname", None),
-            "isPro": getattr(row, "is_pro", None),
         }
-        if pro_only is True and item.get("isPro") is not True:
             continue
-        if pro_only is False and item.get("isPro") is True:
             continue
         normalized.append(item)
-    normalized = ctx._apply_where(normalized, where, aliases=USER_FIELD_ALIASES)
     observed_total = len(rows)
     scan_exhaustive = observed_total < scan_lim
     overview_list_mismatch = (
@@ -537,14 +566,14 @@ async def _user_graph_helper(
         total = observed_total
         total_matched = observed_total
     total_available = overview_total if overview_total is not None else observed_total
-    items = normalized[:ret_lim]
     scan_limit_hit = not exact_count and observed_total >= scan_lim
     count_source = (
         "overview" if overview_total is not None and (not filtered) else "scan"
     )
     sample_complete = (
         exact_count
-        and len(normalized) <= ret_lim
         and (not count_only or len(normalized) == 0)
     )
     more_available = ctx._derive_more_available(
@@ -555,7 +584,19 @@ async def _user_graph_helper(
     )
     if not exact_count and scan_limit_hit:
         more_available = "unknown" if filtered else True
-    items = ctx._project_user_items(items, fields)
     meta = ctx._build_exhaustive_result_meta(
         base_meta={
             "scanned": observed_total,
@@ -645,8 +686,8 @@ async def hf_profile_summary(
             "display_name": overview_item.get("fullname")
             or str(overview_item.get("username") or resolved_handle),
             "bio": overview_item.get("bio"),
-            "avatar_url": overview_item.get("avatarUrl"),
-            "website_url": overview_item.get("websiteUrl"),
             "twitter_url": overview_item.get("twitter"),
             "github_url": overview_item.get("github"),
             "linkedin_url": overview_item.get("linkedin"),
@@ -661,13 +702,13 @@ async def hf_profile_summary(
             "papers_count": ctx._overview_count(overview_item, "papers"),
             "upvotes_count": ctx._overview_count(overview_item, "upvotes"),
             "organizations": overview_item.get("orgs"),
-            "is_pro": overview_item.get("isPro"),
         }
         if "likes" in requested_sections:
             likes = await ctx.call_helper(
                 "hf_user_likes",
                 username=resolved_handle,
-                return_limit=likes_lim,
                 scan_limit=USER_SUMMARY_LIKES_SCAN_LIMIT,
                 count_only=likes_lim == 0,
                 sort="liked_at",
@@ -689,7 +730,7 @@ async def hf_profile_summary(
                 "hf_recent_activity",
                 feed_type="user",
                 entity=resolved_handle,
-                return_limit=activity_lim,
                 max_pages=USER_SUMMARY_ACTIVITY_MAX_PAGES,
                 count_only=activity_lim == 0,
                 fields=["timestamp", "event_type", "repo_type", "repo_id"],
@@ -724,11 +765,11 @@ async def hf_profile_summary(
         item = {
             "handle": str(overview_item.get("organization") or resolved_handle),
             "entity_type": "organization",
-            "display_name": overview_item.get("displayName")
             or str(overview_item.get("organization") or resolved_handle),
             "description": overview_item.get("description"),
-            "avatar_url": overview_item.get("avatarUrl"),
-            "website_url": overview_item.get("websiteUrl"),
             "followers_count": ctx._overview_count(overview_item, "followers"),
             "members_count": ctx._overview_count(overview_item, "members"),
             "models_count": ctx._overview_count(overview_item, "models"),
@@ -765,7 +806,7 @@ async def hf_user_graph(
     ctx: HelperRuntimeContext,
     username: str | None = None,
     relation: str = "followers",
-    return_limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     pro_only: bool | None = None,
@@ -800,7 +841,7 @@ async def hf_user_graph(
         rel,
         resolved_username,
         pro_only,
-        return_limit,
         scan_limit,
         count_only,
         where,

 import re
 from typing import Any, Callable
 from ..context_types import HelperRuntimeContext
 from ..constants import (
+    ACTOR_CANONICAL_FIELDS,
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     GRAPH_SCAN_LIMIT_CAP,
     OUTPUT_ITEMS_TRUNCATION_LIMIT,
     item = {
         "username": username,
         "fullname": payload.get("fullname"),
+        "is_pro": payload.get("isPro"),
     }
     items = [item] if isinstance(username, str) and username else []
     return ctx._helper_success(
         "username": obj.username or u,
         "fullname": obj.fullname,
         "bio": getattr(obj, "details", None),
+        "avatar_url": obj.avatar_url,
+        "website_url": getattr(obj, "websiteUrl", None),
         "twitter": _social_url("twitter", twitter_handle),
         "github": _social_url("github", github_handle),
         "linkedin": _social_url("linkedin", linkedin_handle),
         "bluesky": _social_url("bluesky", bluesky_handle),
+        "twitter_handle": twitter_handle,
+        "github_handle": github_handle,
+        "linkedin_handle": linkedin_handle,
+        "bluesky_handle": bluesky_handle,
         "followers": ctx._as_int(obj.num_followers),
         "following": ctx._as_int(obj.num_following),
         "likes": ctx._as_int(obj.num_likes),
         "papers": ctx._as_int(getattr(obj, "num_papers", None)),
         "upvotes": ctx._as_int(getattr(obj, "num_upvotes", None)),
         "orgs": org_names,
+        "is_pro": obj.is_pro,
     }
     return ctx._helper_success(
         start_calls=start_calls,
         return ctx._helper_error(start_calls=start_calls, source=endpoint, error=e)
     item = {
         "organization": obj.name or org,
+        "display_name": obj.fullname,
+        "avatar_url": obj.avatar_url,
         "description": obj.details,
+        "website_url": getattr(obj, "websiteUrl", None),
         "followers": ctx._as_int(obj.num_followers),
         "members": ctx._as_int(obj.num_users),
         "models": ctx._as_int(getattr(obj, "num_models", None)),
 async def hf_org_members(
     ctx: HelperRuntimeContext,
     organization: str,
+    limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
             source="/api/organizations/<o>/members",
             error="organization is required",
         )
+    default_limit = ctx._policy_int("hf_org_members", "default_limit", 100)
     scan_cap = ctx._policy_int("hf_org_members", "scan_max", GRAPH_SCAN_LIMIT_CAP)
     limit_plan = ctx._resolve_exhaustive_limits(
+        limit=limit,
         count_only=count_only,
+        default_limit=default_limit,
+        max_limit=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
+    applied_limit = int(limit_plan["applied_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
     has_where = isinstance(where, dict) and bool(where)
     overview_total: int | None = None
         item = {
             "username": handle,
             "fullname": getattr(row, "fullname", None),
+            "is_pro": getattr(row, "is_pro", None),
             "role": getattr(row, "role", None),
         }
         normalized.append(item)
+    try:
+        normalized = ctx._apply_where(
+            normalized, where, allowed_fields=ACTOR_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            organization=org,
+        )
     observed_total = len(rows)
     scan_exhaustive = observed_total < scan_lim
     overview_list_mismatch = (
         total = observed_total
         total_matched = observed_total
     total_available = overview_total if overview_total is not None else observed_total
+    items = normalized[:applied_limit]
     scan_limit_hit = not exact_count and observed_total >= scan_lim
     count_source = (
         "overview" if overview_total is not None and (not has_where) else "scan"
     )
     sample_complete = (
         exact_count
+        and len(normalized) <= applied_limit
         and (not count_only or len(normalized) == 0)
     )
     more_available = ctx._derive_more_available(
     )
     if not exact_count and scan_limit_hit:
         more_available = "unknown" if has_where else True
+    try:
+        items = ctx._project_actor_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            organization=org,
+        )
     meta = ctx._build_exhaustive_result_meta(
         base_meta={
             "scanned": observed_total,
     kind: str,
     username: str,
     pro_only: bool | None,
+    limit: int | None,
     scan_limit: int | None,
     count_only: bool,
     where: dict[str, Any] | None,
     helper_name: str,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int(helper_name, "default_limit", 100)
     scan_cap = ctx._policy_int(helper_name, "scan_max", GRAPH_SCAN_LIMIT_CAP)
+    max_limit = ctx._policy_int(
+        helper_name, "max_limit", EXHAUSTIVE_HELPER_RETURN_HARD_CAP
     )
     u = str(username or "").strip()
     if not u:
             error="username is required",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
+        limit=limit,
         count_only=count_only,
+        default_limit=default_limit,
+        max_limit=max_limit,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
+    applied_limit = int(limit_plan["applied_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
     has_where = isinstance(where, dict) and bool(where)
     filtered = pro_only is not None or has_where
         item = {
             "username": handle,
             "fullname": getattr(row, "fullname", None),
+            "is_pro": getattr(row, "is_pro", None),
         }
+        if pro_only is True and item.get("is_pro") is not True:
             continue
+        if pro_only is False and item.get("is_pro") is True:
             continue
         normalized.append(item)
+    try:
+        normalized = ctx._apply_where(
+            normalized, where, allowed_fields=ACTOR_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            relation=kind,
+            username=u,
+            entity=u,
+            entity_type=entity_type,
+            organization=u if entity_type == "organization" else None,
+        )
     observed_total = len(rows)
     scan_exhaustive = observed_total < scan_lim
     overview_list_mismatch = (
         total = observed_total
         total_matched = observed_total
     total_available = overview_total if overview_total is not None else observed_total
+    items = normalized[:applied_limit]
     scan_limit_hit = not exact_count and observed_total >= scan_lim
     count_source = (
         "overview" if overview_total is not None and (not filtered) else "scan"
     )
     sample_complete = (
         exact_count
+        and len(normalized) <= applied_limit
         and (not count_only or len(normalized) == 0)
     )
     more_available = ctx._derive_more_available(
     )
     if not exact_count and scan_limit_hit:
         more_available = "unknown" if filtered else True
+    try:
+        items = ctx._project_actor_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            relation=kind,
+            username=u,
+            entity=u,
+            entity_type=entity_type,
+            organization=u if entity_type == "organization" else None,
+        )
     meta = ctx._build_exhaustive_result_meta(
         base_meta={
             "scanned": observed_total,
             "display_name": overview_item.get("fullname")
             or str(overview_item.get("username") or resolved_handle),
             "bio": overview_item.get("bio"),
+            "avatar_url": overview_item.get("avatar_url"),
+            "website_url": overview_item.get("website_url"),
             "twitter_url": overview_item.get("twitter"),
             "github_url": overview_item.get("github"),
             "linkedin_url": overview_item.get("linkedin"),
             "papers_count": ctx._overview_count(overview_item, "papers"),
             "upvotes_count": ctx._overview_count(overview_item, "upvotes"),
             "organizations": overview_item.get("orgs"),
+            "is_pro": overview_item.get("is_pro"),
         }
         if "likes" in requested_sections:
             likes = await ctx.call_helper(
                 "hf_user_likes",
                 username=resolved_handle,
+                limit=likes_lim,
                 scan_limit=USER_SUMMARY_LIKES_SCAN_LIMIT,
                 count_only=likes_lim == 0,
                 sort="liked_at",
                 "hf_recent_activity",
                 feed_type="user",
                 entity=resolved_handle,
+                limit=activity_lim,
                 max_pages=USER_SUMMARY_ACTIVITY_MAX_PAGES,
                 count_only=activity_lim == 0,
                 fields=["timestamp", "event_type", "repo_type", "repo_id"],
         item = {
             "handle": str(overview_item.get("organization") or resolved_handle),
             "entity_type": "organization",
+            "display_name": overview_item.get("display_name")
             or str(overview_item.get("organization") or resolved_handle),
             "description": overview_item.get("description"),
+            "avatar_url": overview_item.get("avatar_url"),
+            "website_url": overview_item.get("website_url"),
             "followers_count": ctx._overview_count(overview_item, "followers"),
             "members_count": ctx._overview_count(overview_item, "members"),
             "models_count": ctx._overview_count(overview_item, "models"),
     ctx: HelperRuntimeContext,
     username: str | None = None,
     relation: str = "followers",
+    limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     pro_only: bool | None = None,
         rel,
         resolved_username,
         pro_only,
+        limit,
         scan_limit,
         count_only,
         where,

monty_api/helpers/repos.py CHANGED Viewed

@@ -5,20 +5,18 @@ from itertools import islice
 from typing import Any, Callable
 from huggingface_hub import HfApi
 from ..context_types import HelperRuntimeContext
-from ..aliases import (
-    ACTOR_FIELD_ALIASES,
-    DAILY_PAPER_FIELD_ALIASES,
-    REPO_FIELD_ALIASES,
-    USER_LIKES_FIELD_ALIASES,
-)
 from ..constants import (
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     LIKES_ENRICHMENT_MAX_REPOS,
     LIKES_RANKING_WINDOW_DEFAULT,
     LIKES_SCAN_LIMIT_CAP,
     OUTPUT_ITEMS_TRUNCATION_LIMIT,
     SELECTIVE_ENDPOINT_RETURN_HARD_CAP,
     TRENDING_ENDPOINT_MAX_LIMIT,
 )
 from ..registry import (
     REPO_SEARCH_ALLOWED_EXPAND,
@@ -42,11 +40,7 @@ def _sanitize_repo_expand_values(
     elif isinstance(raw_expand, (list, tuple, set)):
         requested_values = list(raw_expand)
     else:
-        return (
-            None,
-            [],
-            "advanced['expand'] must be a string or a list of strings",
-        )
     cleaned: list[str] = []
     for value in requested_values:
@@ -60,254 +54,259 @@ def _sanitize_repo_expand_values(
     return (kept or None, dropped, None)
-def _promote_repo_search_constraints(
-    repo_type: str,
-    author: str | None,
-    advanced: dict[str, Any],
-    where: dict[str, Any] | None,
-) -> tuple[str | None, dict[str, Any], dict[str, Any] | None, dict[str, Any]]:
-    if where is None:
-        return (author, advanced, None, {})
-    remaining_where = dict(where)
-    promoted: dict[str, Any] = {}
-    normalized_author = author
-    author_value = remaining_where.get("author")
-    if normalized_author is None and isinstance(author_value, str) and author_value.strip():
-        normalized_author = author_value.strip()
-        promoted["author"] = normalized_author
-        remaining_where.pop("author", None)
-    if repo_type == "model" and "pipeline_tag" not in advanced:
-        pipeline_value = remaining_where.get("pipeline_tag")
-        if isinstance(pipeline_value, str) and pipeline_value.strip():
-            advanced["pipeline_tag"] = pipeline_value.strip()
-            promoted["pipeline_tag"] = advanced["pipeline_tag"]
-            remaining_where.pop("pipeline_tag", None)
-    return (normalized_author, advanced, remaining_where or None, promoted)
 def _normalize_user_likes_sort(sort: str | None) -> tuple[str | None, str | None]:
-    raw = str(sort or "liked_at").strip()
-    alias_map = {
-        "": "liked_at",
-        "likedat": "liked_at",
-        "liked_at": "liked_at",
-        "liked-at": "liked_at",
-        "recency": "liked_at",
-        "repolikes": "repo_likes",
-        "repo_likes": "repo_likes",
-        "repo-likes": "repo_likes",
-        "repodownloads": "repo_downloads",
-        "repo_downloads": "repo_downloads",
-        "repo-downloads": "repo_downloads",
-    }
-    normalized = alias_map.get(raw.lower(), raw)
     if normalized not in {"liked_at", "repo_likes", "repo_downloads"}:
         return (None, "sort must be one of liked_at, repo_likes, repo_downloads")
     return (normalized, None)
-async def hf_repo_search(
     ctx: HelperRuntimeContext,
-    query: str | None = None,
-    repo_type: str | None = None,
-    repo_types: list[str] | None = None,
-    author: str | None = None,
-    filters: list[str] | None = None,
-    sort: str | None = None,
-    limit: int = 20,
-    where: dict[str, Any] | None = None,
-    fields: list[str] | None = None,
-    advanced: dict[str, Any] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_repo_search", "default_return", 20)
-    max_return = ctx._policy_int(
-        "hf_repo_search", "max_return", SELECTIVE_ENDPOINT_RETURN_HARD_CAP
     )
-    if repo_type is not None and repo_types is not None:
         return ctx._helper_error(
             start_calls=start_calls,
             source="/api/repos",
-            error="Pass either repo_type or repo_types, not both",
         )
-    if repo_types is None:
-        if repo_type is None or not str(repo_type).strip():
-            requested_repo_types = ["model"]
-        else:
-            rt = ctx._canonical_repo_type(repo_type, default="")
-            if rt not in {"model", "dataset", "space"}:
-                return ctx._helper_error(
-                    start_calls=start_calls,
-                    source="/api/repos",
-                    error=f"Unsupported repo_type '{repo_type}'",
-                )
-            requested_repo_types = [rt]
-    else:
-        raw_types = ctx._coerce_str_list(repo_types)
-        if not raw_types:
-            return ctx._helper_error(
-                start_calls=start_calls,
-                source="/api/repos",
-                error="repo_types must not be empty",
-            )
-        requested_repo_types: list[str] = []
-        for raw in raw_types:
-            rt = ctx._canonical_repo_type(raw, default="")
-            if rt not in {"model", "dataset", "space"}:
-                return ctx._helper_error(
-                    start_calls=start_calls,
-                    source="/api/repos",
-                    error=f"Unsupported repo_type '{raw}'",
-                )
-            requested_repo_types.append(rt)
-    filter_list = ctx._coerce_str_list(filters)
-    term = str(query or "").strip()
-    author_clean = str(author or "").strip() or None
     requested_limit = limit
-    lim = ctx._clamp_int(limit, default=default_return, minimum=1, maximum=max_return)
     limit_meta = ctx._derive_limit_metadata(
-        requested_return_limit=requested_limit,
-        applied_return_limit=lim,
-        default_limit_used=limit == default_return,
     )
     hard_cap_applied = bool(limit_meta.get("hard_cap_applied"))
-    if advanced is not None and (not isinstance(advanced, dict)):
-        return ctx._helper_error(
-            start_calls=start_calls,
-            source="/api/repos",
-            error="advanced must be a dict when provided",
-        )
-    if advanced is not None and len(requested_repo_types) != 1:
-        return ctx._helper_error(
-            start_calls=start_calls,
-            source="/api/repos",
-            error="advanced may only be used with a single repo_type",
-        )
     sort_keys: dict[str, str | None] = {}
-    for rt in requested_repo_types:
-        sort_key, sort_error = ctx._normalize_repo_sort_key(rt, sort)
         if sort_error:
             return ctx._helper_error(
-                start_calls=start_calls, source=f"/api/{rt}s", error=sort_error
             )
-        sort_keys[rt] = sort_key
     all_items: list[dict[str, Any]] = []
     scanned = 0
     source_endpoints: list[str] = []
     limit_boundary_hit = False
     ignored_expand: dict[str, list[str]] = {}
-    promoted_where_filters: dict[str, Any] = {}
-    effective_where = dict(where) if isinstance(where, dict) else where
     api = ctx._get_hf_api_client()
-    for rt in requested_repo_types:
-        endpoint = f"/api/{rt}s"
         source_endpoints.append(endpoint)
-        extra_args = dict(advanced or {}) if len(requested_repo_types) == 1 else {}
-        effective_author = author_clean
-        allowed_extra = REPO_SEARCH_EXTRA_ARGS.get(rt, set())
-        unsupported = sorted(
-            (str(k) for k in extra_args.keys() if str(k) not in allowed_extra)
         )
-        if unsupported:
             return ctx._helper_error(
                 start_calls=start_calls,
                 source=endpoint,
-                error=f"Unsupported advanced args for repo_type='{rt}': {unsupported}. Allowed advanced args: {sorted(allowed_extra)}",
-            )
-        if "card_data" in extra_args and "cardData" not in extra_args:
-            extra_args["cardData"] = extra_args.pop("card_data")
-        else:
-            extra_args.pop("card_data", None)
-        if len(requested_repo_types) == 1 and isinstance(effective_where, dict):
-            (
-                effective_author,
-                extra_args,
-                effective_where,
-                promoted_where_filters,
-            ) = _promote_repo_search_constraints(
-                rt,
-                effective_author,
-                extra_args,
-                effective_where,
-            )
-        if "expand" in extra_args:
-            normalized_expand, dropped_expand, expand_error = (
-                _sanitize_repo_expand_values(rt, extra_args.get("expand"))
             )
-            if expand_error:
-                return ctx._helper_error(
-                    start_calls=start_calls,
-                    source=endpoint,
-                    error=expand_error,
-                )
-            if dropped_expand:
-                ignored_expand[rt] = dropped_expand
-            if normalized_expand is None:
-                extra_args.pop("expand", None)
-            else:
-                extra_args["expand"] = normalized_expand
-        if not any(
-            (
-                key in extra_args
-                for key in ("expand", "full", "cardData", "fetch_config")
-            )
-        ):
-            extra_args["expand"] = list(REPO_SEARCH_DEFAULT_EXPAND[rt])
         try:
             payload = ctx._host_hf_call(
                 endpoint,
-                lambda rt=rt, extra_args=extra_args: ctx._repo_list_call(
                     api,
-                    rt,
-                    search=term or None,
-                    author=effective_author,
-                    filter=filter_list or None,
-                    sort=sort_keys[rt],
-                    limit=lim,
                     **extra_args,
                 ),
             )
         except Exception as e:
             return ctx._helper_error(start_calls=start_calls, source=endpoint, error=e)
         scanned += len(payload)
-        if len(payload) >= lim:
             limit_boundary_hit = True
         all_items.extend(
-            (ctx._normalize_repo_search_row(row, rt) for row in payload[:lim])
         )
-    all_items = ctx._apply_where(all_items, effective_where, aliases=REPO_FIELD_ALIASES)
     combined_sort_key = next(iter(sort_keys.values()), None)
     all_items = ctx._sort_repo_rows(all_items, combined_sort_key)
     matched = len(all_items)
-    all_items = ctx._project_repo_items(all_items[:lim], fields)
     more_available: bool | str = False
     truncated = False
     truncated_by = "none"
     next_request_hint: str | None = None
-    if hard_cap_applied and scanned >= lim:
         truncated = True
         truncated_by = "hard_cap"
         more_available = "unknown"
-        next_request_hint = f"Increase limit above {lim} to improve coverage"
     elif limit_boundary_hit:
         more_available = "unknown"
         next_request_hint = (
-            f"Increase limit above {lim} to check whether more rows exist"
         )
     return ctx._helper_success(
         start_calls=start_calls,
         source=",".join(source_endpoints),
         items=all_items,
-        query=term or None,
         repo_types=requested_repo_types,
-        filters=filter_list or None,
         sort=combined_sort_key,
         author=author_clean,
-        limit=lim,
         scanned=scanned,
         matched=matched,
         returned=len(all_items),
@@ -317,16 +316,199 @@ async def hf_repo_search(
         limit_boundary_hit=limit_boundary_hit,
         next_request_hint=next_request_hint,
         ignored_expand=ignored_expand or None,
-        promoted_where_filters=promoted_where_filters or None,
         **limit_meta,
     )
 async def hf_user_likes(
     ctx: HelperRuntimeContext,
     username: str | None = None,
     repo_types: list[str] | None = None,
-    return_limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
@@ -335,7 +517,7 @@ async def hf_user_likes(
     ranking_window: int | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_user_likes", "default_return", 100)
     scan_cap = ctx._policy_int("hf_user_likes", "scan_max", LIKES_SCAN_LIMIT_CAP)
     ranking_default = ctx._policy_int(
         "hf_user_likes", "ranking_default", LIKES_RANKING_WINDOW_DEFAULT
@@ -370,16 +552,25 @@ async def hf_user_likes(
             error="sort must be one of liked_at, repo_likes, repo_downloads",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
-        return_limit=return_limit,
         count_only=count_only,
-        default_return=default_return,
-        max_return=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
-    ret_lim = int(limit_plan["applied_return_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
-    normalized_where = ctx._normalize_where(where, aliases=USER_LIKES_FIELD_ALIASES)
     allowed_repo_types: set[str] | None = None
     try:
         raw_repo_types: list[str] = (
@@ -456,7 +647,7 @@ async def hf_user_likes(
         selected_pairs = []
         ranking_complete = False if matched > 0 else exact_count
     elif sort_key == "liked_at":
-        selected_pairs = matched_rows[:ret_lim]
     else:
         metric = str(sort_key)
         requested_window = (
@@ -504,19 +695,26 @@ async def hf_user_likes(
             return (0, -metric_value, idx)
         ranked_shortlist = sorted(shortlist, key=_ranking_key)
-        selected_pairs = ranked_shortlist[:ret_lim]
         ranking_complete = (
             exact_count
             and shortlist_size >= matched
             and (len(candidates) <= enrich_budget)
         )
-    items = ctx._project_user_like_items([row for _, row in selected_pairs], fields)
     popularity_present = sum(
         (1 for _, row in selected_pairs if row.get("repo_likes") is not None)
     )
     sample_complete = (
         exact_count
-        and ret_lim >= matched
         and (sort_key == "liked_at" or ranking_complete)
         and (not count_only or matched == 0)
     )
@@ -563,7 +761,7 @@ async def hf_repo_likers(
     ctx: HelperRuntimeContext,
     repo_id: str,
     repo_type: str,
-    return_limit: int | None = None,
     count_only: bool = False,
     pro_only: bool | None = None,
     where: dict[str, Any] | None = None,
@@ -585,9 +783,9 @@ async def hf_repo_likers(
             error=f"Unsupported repo_type '{repo_type}'",
             repo_id=rid,
         )
-    default_return = ctx._policy_int("hf_repo_likers", "default_return", 1000)
-    requested_return_limit = return_limit
-    default_limit_used = requested_return_limit is None and (not count_only)
     has_where = isinstance(where, dict) and bool(where)
     endpoint = f"/api/{rt}s/{rid}/likers"
     resp = ctx._host_raw_call(endpoint)
@@ -600,7 +798,18 @@ async def hf_repo_likers(
             repo_type=rt,
         )
     payload = resp.get("data") if isinstance(resp.get("data"), list) else []
-    normalized_where = ctx._normalize_where(where, aliases=ACTOR_FIELD_ALIASES)
     normalized: list[dict[str, Any]] = []
     for row in payload:
         if not isinstance(row, dict):
@@ -614,37 +823,37 @@ async def hf_repo_likers(
             "type": row.get("type")
             if isinstance(row.get("type"), str) and row.get("type")
             else "user",
-            "isPro": row.get("isPro"),
         }
-        if pro_only is True and item.get("isPro") is not True:
             continue
-        if pro_only is False and item.get("isPro") is True:
             continue
         if not ctx._item_matches_where(item, normalized_where):
             continue
         normalized.append(item)
     if count_only:
-        ret_lim = 0
-    elif requested_return_limit is None:
-        ret_lim = default_return
     else:
         try:
-            ret_lim = max(0, int(requested_return_limit))
         except Exception:
-            ret_lim = default_return
     limit_plan = {
-        "requested_return_limit": requested_return_limit,
-        "applied_return_limit": ret_lim,
         "default_limit_used": default_limit_used,
         "hard_cap_applied": False,
     }
     matched = len(normalized)
-    items = [] if count_only else normalized[:ret_lim]
-    return_limit_hit = ret_lim > 0 and matched > ret_lim
     truncated_by = ctx._derive_truncated_by(
-        hard_cap=False, return_limit_hit=return_limit_hit
     )
-    sample_complete = matched <= ret_lim and (not count_only or matched == 0)
     truncated = truncated_by != "none"
     more_available = ctx._derive_more_available(
         sample_complete=sample_complete,
@@ -652,7 +861,16 @@ async def hf_repo_likers(
         returned=len(items),
         total=matched,
     )
-    items = ctx._project_actor_items(items, fields)
     meta = ctx._build_exhaustive_meta(
         base_meta={
             "scanned": len(payload),
@@ -683,7 +901,11 @@ async def hf_repo_likers(
 async def hf_repo_discussions(
-    ctx: HelperRuntimeContext, repo_type: str, repo_id: str, limit: int = 20
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
     rt = ctx._canonical_repo_type(repo_type)
@@ -718,17 +940,21 @@ async def hf_repo_discussions(
         items.append(
             {
                 "num": num,
-                "number": num,
-                "discussionNum": num,
-                "id": num,
                 "title": getattr(d, "title", None),
                 "author": getattr(d, "author", None),
-                "createdAt": str(getattr(d, "created_at", None))
                 if getattr(d, "created_at", None) is not None
                 else None,
                 "status": getattr(d, "status", None),
             }
         )
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,
@@ -742,7 +968,11 @@ async def hf_repo_discussions(
 async def hf_repo_discussion_details(
-    ctx: HelperRuntimeContext, repo_type: str, repo_id: str, discussion_num: int
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
     rt = ctx._canonical_repo_type(repo_type)
@@ -779,7 +1009,7 @@ async def hf_repo_discussion_details(
             comment_events.append(
                 {
                     "author": getattr(event, "author", None),
-                    "createdAt": ctx._dt_to_str(getattr(event, "created_at", None)),
                     "text": getattr(event, "content", None),
                     "rendered": getattr(event, "rendered", None),
                 }
@@ -787,31 +1017,22 @@ async def hf_repo_discussion_details(
     latest_comment: dict[str, Any] | None = None
     if comment_events:
         latest_comment = max(
-            comment_events, key=lambda row: str(row.get("createdAt") or "")
         )
     item: dict[str, Any] = {
         "num": num,
-        "number": num,
-        "discussionNum": num,
-        "id": num,
         "repo_id": rid,
         "repo_type": rt,
         "title": getattr(detail, "title", None),
         "author": getattr(detail, "author", None),
-        "createdAt": ctx._dt_to_str(getattr(detail, "created_at", None)),
         "status": getattr(detail, "status", None),
         "url": getattr(detail, "url", None),
-        "commentCount": len(comment_events),
-        "latestCommentAuthor": latest_comment.get("author") if latest_comment else None,
-        "latestCommentCreatedAt": latest_comment.get("createdAt")
-        if latest_comment
-        else None,
-        "latestCommentText": latest_comment.get("text") if latest_comment else None,
-        "latestCommentHtml": latest_comment.get("rendered") if latest_comment else None,
         "latest_comment_author": latest_comment.get("author")
         if latest_comment
         else None,
-        "latest_comment_created_at": latest_comment.get("createdAt")
         if latest_comment
         else None,
         "latest_comment_text": latest_comment.get("text") if latest_comment else None,
@@ -819,13 +1040,17 @@ async def hf_repo_discussion_details(
         if latest_comment
         else None,
     }
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,
-        items=[item],
         scanned=len(comment_events),
         matched=1,
-        returned=1,
         truncated=False,
         total_comments=len(comment_events),
     )
@@ -926,7 +1151,10 @@ async def hf_repo_details(
             failures=failures,
             repo_type=repo_type,
         )
-    items = ctx._project_repo_items(items, fields)
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/repos",
@@ -947,9 +1175,9 @@ async def hf_trending(
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_trending", "default_return", 20)
-    max_return = ctx._policy_int(
-        "hf_trending", "max_return", TRENDING_ENDPOINT_MAX_LIMIT
     )
     raw_type = str(repo_type or "model").strip().lower()
     if raw_type == "all":
@@ -962,7 +1190,7 @@ async def hf_trending(
                 source="/api/trending",
                 error=f"Unsupported repo_type '{repo_type}'",
             )
-    lim = ctx._clamp_int(limit, default=default_return, minimum=1, maximum=max_return)
     resp = ctx._host_raw_call(
         "/api/trending", params={"type": requested_type, "limit": lim}
     )
@@ -985,9 +1213,23 @@ async def hf_trending(
             continue
         repo = row.get("repoData") if isinstance(row.get("repoData"), dict) else {}
         items.append(ctx._normalize_trending_row(repo, default_row_type, rank=idx))
-    items = ctx._apply_where(items, where, aliases=REPO_FIELD_ALIASES)
     matched = len(items)
-    items = ctx._project_repo_items(items[:lim], fields)
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/trending",
@@ -1011,11 +1253,11 @@ async def hf_daily_papers(
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
-    default_return = ctx._policy_int("hf_daily_papers", "default_return", 20)
-    max_return = ctx._policy_int(
-        "hf_daily_papers", "max_return", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
-    lim = ctx._clamp_int(limit, default=default_return, minimum=1, maximum=max_return)
     resp = ctx._host_raw_call("/api/daily_papers", params={"limit": lim})
     if not resp.get("ok"):
         return ctx._helper_error(
@@ -1029,9 +1271,25 @@ async def hf_daily_papers(
         if not isinstance(row, dict):
             continue
         items.append(ctx._normalize_daily_paper_row(row, rank=idx))
-    items = ctx._apply_where(items, where, aliases=DAILY_PAPER_FIELD_ALIASES)
     matched = len(items)
-    items = ctx._project_daily_paper_items(items[:lim], fields)
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/daily_papers",
@@ -1046,6 +1304,9 @@ async def hf_daily_papers(
 def register_repo_helpers(ctx: HelperRuntimeContext) -> dict[str, Callable[..., Any]]:
     return {
         "hf_repo_search": partial(hf_repo_search, ctx),
         "hf_user_likes": partial(hf_user_likes, ctx),
         "hf_repo_likers": partial(hf_repo_likers, ctx),

 from typing import Any, Callable
 from huggingface_hub import HfApi
 from ..context_types import HelperRuntimeContext
 from ..constants import (
+    ACTOR_CANONICAL_FIELDS,
+    DAILY_PAPER_CANONICAL_FIELDS,
     EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
     LIKES_ENRICHMENT_MAX_REPOS,
     LIKES_RANKING_WINDOW_DEFAULT,
     LIKES_SCAN_LIMIT_CAP,
     OUTPUT_ITEMS_TRUNCATION_LIMIT,
+    REPO_CANONICAL_FIELDS,
     SELECTIVE_ENDPOINT_RETURN_HARD_CAP,
     TRENDING_ENDPOINT_MAX_LIMIT,
+    USER_LIKES_CANONICAL_FIELDS,
 )
 from ..registry import (
     REPO_SEARCH_ALLOWED_EXPAND,
     elif isinstance(raw_expand, (list, tuple, set)):
         requested_values = list(raw_expand)
     else:
+        return (None, [], "expand must be a string or a list of strings")
     cleaned: list[str] = []
     for value in requested_values:
     return (kept or None, dropped, None)
+def _resolve_repo_search_types(
+    ctx: HelperRuntimeContext,
+    *,
+    repo_type: str | None,
+    repo_types: list[str] | None,
+    default_repo_type: str = "model",
+) -> tuple[list[str] | None, str | None]:
+    if repo_type is not None and repo_types is not None:
+        return (None, "Pass either repo_type or repo_types, not both")
+    if repo_types is None:
+        raw_type = str(repo_type or "").strip()
+        if not raw_type:
+            return ([default_repo_type], None)
+        canonical = ctx._canonical_repo_type(raw_type, default="")
+        if canonical not in {"model", "dataset", "space"}:
+            return (None, f"Unsupported repo_type '{repo_type}'")
+        return ([canonical], None)
+    raw_types = ctx._coerce_str_list(repo_types)
+    if not raw_types:
+        return (None, "repo_types must not be empty")
+    requested_repo_types: list[str] = []
+    for raw in raw_types:
+        canonical = ctx._canonical_repo_type(raw, default="")
+        if canonical not in {"model", "dataset", "space"}:
+            return (None, f"Unsupported repo_type '{raw}'")
+        if canonical not in requested_repo_types:
+            requested_repo_types.append(canonical)
+    return (requested_repo_types, None)
+def _clean_repo_search_text(value: str | None) -> str | None:
+    cleaned = str(value or "").strip()
+    return cleaned or None
+def _normalize_repo_search_filter(
+    ctx: HelperRuntimeContext, value: str | list[str] | None
+) -> tuple[list[str] | None, str | None]:
+    if value is None:
+        return (None, None)
+    try:
+        normalized = ctx._coerce_str_list(value)
+    except ValueError:
+        return (None, "filter must be a string or a list of strings")
+    return (normalized or None, None)
+def _build_repo_search_extra_args(
+    repo_type: str, **candidate_args: Any
+) -> tuple[dict[str, Any], list[str], str | None]:
+    normalized: dict[str, Any] = {}
+    for key, value in candidate_args.items():
+        if value is None:
+            continue
+        if key in {"card_data", "cardData"}:
+            if value:
+                normalized["cardData"] = True
+            continue
+        if key in {"fetch_config", "linked"}:
+            if value:
+                normalized[key] = True
+            continue
+        normalized[key] = value
+    allowed_extra = REPO_SEARCH_EXTRA_ARGS.get(repo_type, set())
+    unsupported = sorted(str(key) for key in normalized if str(key) not in allowed_extra)
+    if unsupported:
+        return (
+            {},
+            [],
+            f"Unsupported search args for repo_type='{repo_type}': {unsupported}. Allowed args: {sorted(allowed_extra)}",
+        )
+    dropped_expand: list[str] = []
+    if "expand" in normalized:
+        kept_expand, dropped_expand, expand_error = _sanitize_repo_expand_values(
+            repo_type, normalized.get("expand")
+        )
+        if expand_error:
+            return ({}, [], expand_error)
+        if kept_expand is None:
+            normalized.pop("expand", None)
+        else:
+            normalized["expand"] = kept_expand
+    if not any(
+        key in normalized for key in ("expand", "full", "cardData", "fetch_config")
+    ):
+        normalized["expand"] = list(REPO_SEARCH_DEFAULT_EXPAND[repo_type])
+    return (normalized, dropped_expand, None)
 def _normalize_user_likes_sort(sort: str | None) -> tuple[str | None, str | None]:
+    normalized = str(sort or "liked_at").strip() or "liked_at"
     if normalized not in {"liked_at", "repo_likes", "repo_downloads"}:
         return (None, "sort must be one of liked_at, repo_likes, repo_downloads")
     return (normalized, None)
+async def _run_repo_search(
     ctx: HelperRuntimeContext,
+    *,
+    helper_name: str,
+    requested_repo_types: list[str],
+    search: str | None,
+    filter: str | list[str] | None,
+    author: str | None,
+    sort: str | None,
+    limit: int,
+    fields: list[str] | None,
+    post_filter: dict[str, Any] | None,
+    extra_args_by_type: dict[str, dict[str, Any]] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int(helper_name, "default_limit", 20)
+    max_limit = ctx._policy_int(
+        helper_name, "max_limit", SELECTIVE_ENDPOINT_RETURN_HARD_CAP
     )
+    filter_list, filter_error = _normalize_repo_search_filter(ctx, filter)
+    if filter_error:
         return ctx._helper_error(
             start_calls=start_calls,
             source="/api/repos",
+            error=filter_error,
         )
+    term = _clean_repo_search_text(search)
+    author_clean = _clean_repo_search_text(author)
     requested_limit = limit
+    applied_limit = ctx._clamp_int(
+        limit,
+        default=default_limit,
+        minimum=1,
+        maximum=max_limit,
+    )
     limit_meta = ctx._derive_limit_metadata(
+        requested_limit=requested_limit,
+        applied_limit=applied_limit,
+        default_limit_used=limit == default_limit,
     )
     hard_cap_applied = bool(limit_meta.get("hard_cap_applied"))
     sort_keys: dict[str, str | None] = {}
+    for repo_type in requested_repo_types:
+        sort_key, sort_error = ctx._normalize_repo_sort_key(repo_type, sort)
         if sort_error:
             return ctx._helper_error(
+                start_calls=start_calls,
+                source=f"/api/{repo_type}s",
+                error=sort_error,
             )
+        sort_keys[repo_type] = sort_key
     all_items: list[dict[str, Any]] = []
     scanned = 0
     source_endpoints: list[str] = []
     limit_boundary_hit = False
     ignored_expand: dict[str, list[str]] = {}
     api = ctx._get_hf_api_client()
+    for repo_type in requested_repo_types:
+        endpoint = f"/api/{repo_type}s"
         source_endpoints.append(endpoint)
+        raw_extra_args = dict((extra_args_by_type or {}).get(repo_type, {}))
+        extra_args, dropped_expand, extra_error = _build_repo_search_extra_args(
+            repo_type,
+            **raw_extra_args,
         )
+        if extra_error:
             return ctx._helper_error(
                 start_calls=start_calls,
                 source=endpoint,
+                error=extra_error,
             )
+        if dropped_expand:
+            ignored_expand[repo_type] = dropped_expand
         try:
             payload = ctx._host_hf_call(
                 endpoint,
+                lambda repo_type=repo_type, extra_args=extra_args: ctx._repo_list_call(
                     api,
+                    repo_type,
+                    search=term,
+                    author=author_clean,
+                    filter=filter_list,
+                    sort=sort_keys[repo_type],
+                    limit=applied_limit,
                     **extra_args,
                 ),
             )
         except Exception as e:
             return ctx._helper_error(start_calls=start_calls, source=endpoint, error=e)
         scanned += len(payload)
+        if len(payload) >= applied_limit:
             limit_boundary_hit = True
         all_items.extend(
+            ctx._normalize_repo_search_row(row, repo_type)
+            for row in payload[:applied_limit]
+        )
+    try:
+        all_items = ctx._apply_where(
+            all_items, post_filter, allowed_fields=REPO_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/repos",
+            error=exc,
         )
     combined_sort_key = next(iter(sort_keys.values()), None)
     all_items = ctx._sort_repo_rows(all_items, combined_sort_key)
     matched = len(all_items)
+    try:
+        all_items = ctx._project_repo_items(all_items[:applied_limit], fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/repos",
+            error=exc,
+        )
     more_available: bool | str = False
     truncated = False
     truncated_by = "none"
     next_request_hint: str | None = None
+    if hard_cap_applied and scanned >= applied_limit:
         truncated = True
         truncated_by = "hard_cap"
         more_available = "unknown"
+        next_request_hint = f"Increase limit above {applied_limit} to improve coverage"
     elif limit_boundary_hit:
         more_available = "unknown"
         next_request_hint = (
+            f"Increase limit above {applied_limit} to check whether more rows exist"
         )
     return ctx._helper_success(
         start_calls=start_calls,
         source=",".join(source_endpoints),
         items=all_items,
+        helper=helper_name,
+        search=term,
         repo_types=requested_repo_types,
+        filter=filter_list,
         sort=combined_sort_key,
         author=author_clean,
+        limit=applied_limit,
+        post_filter=post_filter if isinstance(post_filter, dict) and post_filter else None,
         scanned=scanned,
         matched=matched,
         returned=len(all_items),
         limit_boundary_hit=limit_boundary_hit,
         next_request_hint=next_request_hint,
         ignored_expand=ignored_expand or None,
         **limit_meta,
     )
+async def hf_models_search(
+    ctx: HelperRuntimeContext,
+    search: str | None = None,
+    filter: str | list[str] | None = None,
+    author: str | None = None,
+    apps: str | list[str] | None = None,
+    gated: bool | None = None,
+    inference: str | None = None,
+    inference_provider: str | list[str] | None = None,
+    model_name: str | None = None,
+    trained_dataset: str | list[str] | None = None,
+    pipeline_tag: str | None = None,
+    emissions_thresholds: tuple[float, float] | None = None,
+    sort: str | None = None,
+    limit: int = 20,
+    expand: list[str] | None = None,
+    full: bool | None = None,
+    card_data: bool = False,
+    fetch_config: bool = False,
+    fields: list[str] | None = None,
+    post_filter: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    return await _run_repo_search(
+        ctx,
+        helper_name="hf_models_search",
+        requested_repo_types=["model"],
+        search=search,
+        filter=filter,
+        author=author,
+        sort=sort,
+        limit=limit,
+        fields=fields,
+        post_filter=post_filter,
+        extra_args_by_type={
+            "model": {
+                "apps": apps,
+                "gated": gated,
+                "inference": inference,
+                "inference_provider": inference_provider,
+                "model_name": model_name,
+                "trained_dataset": trained_dataset,
+                "pipeline_tag": pipeline_tag,
+                "emissions_thresholds": emissions_thresholds,
+                "expand": expand,
+                "full": full,
+                "card_data": card_data,
+                "fetch_config": fetch_config,
+            }
+        },
+    )
+async def hf_datasets_search(
+    ctx: HelperRuntimeContext,
+    search: str | None = None,
+    filter: str | list[str] | None = None,
+    author: str | None = None,
+    benchmark: str | bool | None = None,
+    dataset_name: str | None = None,
+    gated: bool | None = None,
+    language_creators: str | list[str] | None = None,
+    language: str | list[str] | None = None,
+    multilinguality: str | list[str] | None = None,
+    size_categories: str | list[str] | None = None,
+    task_categories: str | list[str] | None = None,
+    task_ids: str | list[str] | None = None,
+    sort: str | None = None,
+    limit: int = 20,
+    expand: list[str] | None = None,
+    full: bool | None = None,
+    fields: list[str] | None = None,
+    post_filter: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    return await _run_repo_search(
+        ctx,
+        helper_name="hf_datasets_search",
+        requested_repo_types=["dataset"],
+        search=search,
+        filter=filter,
+        author=author,
+        sort=sort,
+        limit=limit,
+        fields=fields,
+        post_filter=post_filter,
+        extra_args_by_type={
+            "dataset": {
+                "benchmark": benchmark,
+                "dataset_name": dataset_name,
+                "gated": gated,
+                "language_creators": language_creators,
+                "language": language,
+                "multilinguality": multilinguality,
+                "size_categories": size_categories,
+                "task_categories": task_categories,
+                "task_ids": task_ids,
+                "expand": expand,
+                "full": full,
+            }
+        },
+    )
+async def hf_spaces_search(
+    ctx: HelperRuntimeContext,
+    search: str | None = None,
+    filter: str | list[str] | None = None,
+    author: str | None = None,
+    datasets: str | list[str] | None = None,
+    models: str | list[str] | None = None,
+    linked: bool = False,
+    sort: str | None = None,
+    limit: int = 20,
+    expand: list[str] | None = None,
+    full: bool | None = None,
+    fields: list[str] | None = None,
+    post_filter: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    return await _run_repo_search(
+        ctx,
+        helper_name="hf_spaces_search",
+        requested_repo_types=["space"],
+        search=search,
+        filter=filter,
+        author=author,
+        sort=sort,
+        limit=limit,
+        fields=fields,
+        post_filter=post_filter,
+        extra_args_by_type={
+            "space": {
+                "datasets": datasets,
+                "models": models,
+                "linked": linked,
+                "expand": expand,
+                "full": full,
+            }
+        },
+    )
+async def hf_repo_search(
+    ctx: HelperRuntimeContext,
+    search: str | None = None,
+    repo_type: str | None = None,
+    repo_types: list[str] | None = None,
+    filter: str | list[str] | None = None,
+    author: str | None = None,
+    sort: str | None = None,
+    limit: int = 20,
+    fields: list[str] | None = None,
+    post_filter: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    start_calls = ctx.call_count["n"]
+    requested_repo_types, type_error = _resolve_repo_search_types(
+        ctx,
+        repo_type=repo_type,
+        repo_types=repo_types,
+    )
+    if type_error:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/repos",
+            error=type_error,
+        )
+    if not requested_repo_types:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/repos",
+            error="repo_type or repo_types is required",
+        )
+    return await _run_repo_search(
+        ctx,
+        helper_name="hf_repo_search",
+        requested_repo_types=requested_repo_types,
+        search=search,
+        filter=filter,
+        author=author,
+        sort=sort,
+        limit=limit,
+        fields=fields,
+        post_filter=post_filter,
+    )
 async def hf_user_likes(
     ctx: HelperRuntimeContext,
     username: str | None = None,
     repo_types: list[str] | None = None,
+    limit: int | None = None,
     scan_limit: int | None = None,
     count_only: bool = False,
     where: dict[str, Any] | None = None,
     ranking_window: int | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_user_likes", "default_limit", 100)
     scan_cap = ctx._policy_int("hf_user_likes", "scan_max", LIKES_SCAN_LIMIT_CAP)
     ranking_default = ctx._policy_int(
         "hf_user_likes", "ranking_default", LIKES_RANKING_WINDOW_DEFAULT
             error="sort must be one of liked_at, repo_likes, repo_downloads",
         )
     limit_plan = ctx._resolve_exhaustive_limits(
+        limit=limit,
         count_only=count_only,
+        default_limit=default_limit,
+        max_limit=EXHAUSTIVE_HELPER_RETURN_HARD_CAP,
         scan_limit=scan_limit,
         scan_cap=scan_cap,
     )
+    applied_limit = int(limit_plan["applied_limit"])
     scan_lim = int(limit_plan["applied_scan_limit"])
+    try:
+        normalized_where = ctx._normalize_where(
+            where, allowed_fields=USER_LIKES_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=f"/api/users/{resolved_username}/likes",
+            error=exc,
+        )
     allowed_repo_types: set[str] | None = None
     try:
         raw_repo_types: list[str] = (
         selected_pairs = []
         ranking_complete = False if matched > 0 else exact_count
     elif sort_key == "liked_at":
+        selected_pairs = matched_rows[:applied_limit]
     else:
         metric = str(sort_key)
         requested_window = (
             return (0, -metric_value, idx)
         ranked_shortlist = sorted(shortlist, key=_ranking_key)
+        selected_pairs = ranked_shortlist[:applied_limit]
         ranking_complete = (
             exact_count
             and shortlist_size >= matched
             and (len(candidates) <= enrich_budget)
         )
+    try:
+        items = ctx._project_user_like_items([row for _, row in selected_pairs], fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+        )
     popularity_present = sum(
         (1 for _, row in selected_pairs if row.get("repo_likes") is not None)
     )
     sample_complete = (
         exact_count
+        and applied_limit >= matched
         and (sort_key == "liked_at" or ranking_complete)
         and (not count_only or matched == 0)
     )
     ctx: HelperRuntimeContext,
     repo_id: str,
     repo_type: str,
+    limit: int | None = None,
     count_only: bool = False,
     pro_only: bool | None = None,
     where: dict[str, Any] | None = None,
             error=f"Unsupported repo_type '{repo_type}'",
             repo_id=rid,
         )
+    default_limit = ctx._policy_int("hf_repo_likers", "default_limit", 1000)
+    requested_limit = limit
+    default_limit_used = requested_limit is None and (not count_only)
     has_where = isinstance(where, dict) and bool(where)
     endpoint = f"/api/{rt}s/{rid}/likers"
     resp = ctx._host_raw_call(endpoint)
             repo_type=rt,
         )
     payload = resp.get("data") if isinstance(resp.get("data"), list) else []
+    try:
+        normalized_where = ctx._normalize_where(
+            where, allowed_fields=ACTOR_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            repo_id=rid,
+            repo_type=rt,
+        )
     normalized: list[dict[str, Any]] = []
     for row in payload:
         if not isinstance(row, dict):
             "type": row.get("type")
             if isinstance(row.get("type"), str) and row.get("type")
             else "user",
+            "is_pro": row.get("isPro"),
         }
+        if pro_only is True and item.get("is_pro") is not True:
             continue
+        if pro_only is False and item.get("is_pro") is True:
             continue
         if not ctx._item_matches_where(item, normalized_where):
             continue
         normalized.append(item)
     if count_only:
+        applied_limit = 0
+    elif requested_limit is None:
+        applied_limit = default_limit
     else:
         try:
+            applied_limit = max(0, int(requested_limit))
         except Exception:
+            applied_limit = default_limit
     limit_plan = {
+        "requested_limit": requested_limit,
+        "applied_limit": applied_limit,
         "default_limit_used": default_limit_used,
         "hard_cap_applied": False,
     }
     matched = len(normalized)
+    items = [] if count_only else normalized[:applied_limit]
+    limit_hit = applied_limit > 0 and matched > applied_limit
     truncated_by = ctx._derive_truncated_by(
+        hard_cap=False, limit_hit=limit_hit
     )
+    sample_complete = matched <= applied_limit and (not count_only or matched == 0)
     truncated = truncated_by != "none"
     more_available = ctx._derive_more_available(
         sample_complete=sample_complete,
         returned=len(items),
         total=matched,
     )
+    try:
+        items = ctx._project_actor_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source=endpoint,
+            error=exc,
+            repo_id=rid,
+            repo_type=rt,
+        )
     meta = ctx._build_exhaustive_meta(
         base_meta={
             "scanned": len(payload),
 async def hf_repo_discussions(
+    ctx: HelperRuntimeContext,
+    repo_type: str,
+    repo_id: str,
+    limit: int = 20,
+    fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
     rt = ctx._canonical_repo_type(repo_type)
         items.append(
             {
                 "num": num,
+                "repo_id": rid,
+                "repo_type": rt,
                 "title": getattr(d, "title", None),
                 "author": getattr(d, "author", None),
+                "created_at": str(getattr(d, "created_at", None))
                 if getattr(d, "created_at", None) is not None
                 else None,
                 "status": getattr(d, "status", None),
+                "url": getattr(d, "url", None),
             }
         )
+    try:
+        items = ctx._project_discussion_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(start_calls=start_calls, source=endpoint, error=exc)
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,
 async def hf_repo_discussion_details(
+    ctx: HelperRuntimeContext,
+    repo_type: str,
+    repo_id: str,
+    discussion_num: int,
+    fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
     rt = ctx._canonical_repo_type(repo_type)
             comment_events.append(
                 {
                     "author": getattr(event, "author", None),
+                    "created_at": ctx._dt_to_str(getattr(event, "created_at", None)),
                     "text": getattr(event, "content", None),
                     "rendered": getattr(event, "rendered", None),
                 }
     latest_comment: dict[str, Any] | None = None
     if comment_events:
         latest_comment = max(
+            comment_events, key=lambda row: str(row.get("created_at") or "")
         )
     item: dict[str, Any] = {
         "num": num,
         "repo_id": rid,
         "repo_type": rt,
         "title": getattr(detail, "title", None),
         "author": getattr(detail, "author", None),
+        "created_at": ctx._dt_to_str(getattr(detail, "created_at", None)),
         "status": getattr(detail, "status", None),
         "url": getattr(detail, "url", None),
+        "comment_count": len(comment_events),
         "latest_comment_author": latest_comment.get("author")
         if latest_comment
         else None,
+        "latest_comment_created_at": latest_comment.get("created_at")
         if latest_comment
         else None,
         "latest_comment_text": latest_comment.get("text") if latest_comment else None,
         if latest_comment
         else None,
     }
+    try:
+        items = ctx._project_discussion_detail_items([item], fields)
+    except ValueError as exc:
+        return ctx._helper_error(start_calls=start_calls, source=endpoint, error=exc)
     return ctx._helper_success(
         start_calls=start_calls,
         source=endpoint,
+        items=items,
         scanned=len(comment_events),
         matched=1,
+        returned=len(items),
         truncated=False,
         total_comments=len(comment_events),
     )
             failures=failures,
             repo_type=repo_type,
         )
+    try:
+        items = ctx._project_repo_items(items, fields)
+    except ValueError as exc:
+        return ctx._helper_error(start_calls=start_calls, source="/api/repos", error=exc)
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/repos",
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_trending", "default_limit", 20)
+    max_limit = ctx._policy_int(
+        "hf_trending", "max_limit", TRENDING_ENDPOINT_MAX_LIMIT
     )
     raw_type = str(repo_type or "model").strip().lower()
     if raw_type == "all":
                 source="/api/trending",
                 error=f"Unsupported repo_type '{repo_type}'",
             )
+    lim = ctx._clamp_int(limit, default=default_limit, minimum=1, maximum=max_limit)
     resp = ctx._host_raw_call(
         "/api/trending", params={"type": requested_type, "limit": lim}
     )
             continue
         repo = row.get("repoData") if isinstance(row.get("repoData"), dict) else {}
         items.append(ctx._normalize_trending_row(repo, default_row_type, rank=idx))
+    try:
+        items = ctx._apply_where(items, where, allowed_fields=REPO_CANONICAL_FIELDS)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/trending",
+            error=exc,
+        )
     matched = len(items)
+    try:
+        items = ctx._project_repo_items(items[:lim], fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/trending",
+            error=exc,
+        )
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/trending",
     fields: list[str] | None = None,
 ) -> dict[str, Any]:
     start_calls = ctx.call_count["n"]
+    default_limit = ctx._policy_int("hf_daily_papers", "default_limit", 20)
+    max_limit = ctx._policy_int(
+        "hf_daily_papers", "max_limit", OUTPUT_ITEMS_TRUNCATION_LIMIT
     )
+    lim = ctx._clamp_int(limit, default=default_limit, minimum=1, maximum=max_limit)
     resp = ctx._host_raw_call("/api/daily_papers", params={"limit": lim})
     if not resp.get("ok"):
         return ctx._helper_error(
         if not isinstance(row, dict):
             continue
         items.append(ctx._normalize_daily_paper_row(row, rank=idx))
+    try:
+        items = ctx._apply_where(
+            items, where, allowed_fields=DAILY_PAPER_CANONICAL_FIELDS
+        )
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/daily_papers",
+            error=exc,
+        )
     matched = len(items)
+    try:
+        items = ctx._project_daily_paper_items(items[:lim], fields)
+    except ValueError as exc:
+        return ctx._helper_error(
+            start_calls=start_calls,
+            source="/api/daily_papers",
+            error=exc,
+        )
     return ctx._helper_success(
         start_calls=start_calls,
         source="/api/daily_papers",
 def register_repo_helpers(ctx: HelperRuntimeContext) -> dict[str, Callable[..., Any]]:
     return {
+        "hf_models_search": partial(hf_models_search, ctx),
+        "hf_datasets_search": partial(hf_datasets_search, ctx),
+        "hf_spaces_search": partial(hf_spaces_search, ctx),
         "hf_repo_search": partial(hf_repo_search, ctx),
         "hf_user_likes": partial(hf_user_likes, ctx),
         "hf_repo_likers": partial(hf_repo_likers, ctx),

monty_api/http_runtime.py CHANGED Viewed

@@ -9,7 +9,7 @@ from urllib.request import Request, urlopen
 from huggingface_hub import HfApi
-from .aliases import REPO_SORT_KEYS, SORT_KEY_ALIASES
 from .constants import (
     DEFAULT_TIMEOUT_SEC,
 )
@@ -78,10 +78,14 @@ def _normalize_repo_sort_key(
     if not raw:
         return None, None
-    key = SORT_KEY_ALIASES.get(raw.lower().replace(" ", "").replace("__", "_"))
-    if key is None:
-        key = SORT_KEY_ALIASES.get(raw.lower())
-    if key is None:
         return None, f"Invalid sort key '{raw}'"
     rt = _canonical_repo_type(repo_type)

 from huggingface_hub import HfApi
+from .aliases import REPO_SORT_KEYS
 from .constants import (
     DEFAULT_TIMEOUT_SEC,
 )
     if not raw:
         return None, None
+    key = raw
+    if key not in {
+        "created_at",
+        "downloads",
+        "last_modified",
+        "likes",
+        "trending_score",
+    }:
         return None, f"Invalid sort key '{raw}'"
     rt = _canonical_repo_type(repo_type)

monty_api/registry.py CHANGED Viewed

@@ -8,6 +8,8 @@ from .constants import (
     ACTOR_CANONICAL_FIELDS,
     COLLECTION_CANONICAL_FIELDS,
     DAILY_PAPER_CANONICAL_FIELDS,
     GRAPH_SCAN_LIMIT_CAP,
     LIKES_ENRICHMENT_MAX_REPOS,
     LIKES_RANKING_WINDOW_DEFAULT,
@@ -18,6 +20,7 @@ from .constants import (
     RECENT_ACTIVITY_SCAN_MAX_PAGES,
     REPO_CANONICAL_FIELDS,
     TRENDING_ENDPOINT_MAX_LIMIT,
 )
@@ -39,7 +42,6 @@ REPO_SEARCH_EXTRA_ARGS: dict[str, set[str]] = {
         "benchmark",
         "dataset_name",
         "expand",
-        "filter",
         "full",
         "gated",
         "language",
@@ -52,11 +54,9 @@ REPO_SEARCH_EXTRA_ARGS: dict[str, set[str]] = {
     "model": {
         "apps",
         "cardData",
-        "card_data",
         "emissions_thresholds",
         "expand",
         "fetch_config",
-        "filter",
         "full",
         "gated",
         "inference",
@@ -65,7 +65,7 @@ REPO_SEARCH_EXTRA_ARGS: dict[str, set[str]] = {
         "pipeline_tag",
         "trained_dataset",
     },
-    "space": {"datasets", "expand", "filter", "full", "linked", "models"},
 }
 REPO_SEARCH_DEFAULT_EXPAND: dict[str, list[str]] = {
@@ -206,7 +206,6 @@ RUNTIME_CAPABILITY_FIELDS = [
     "helpers",
     "helper_defaults",
     "fields",
-    "aliases",
     "limits",
     "repo_search",
 ]
@@ -306,7 +305,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
         "hf_whoami",
         endpoint_patterns=(r"^/api/whoami-v2$",),
         default_metadata=_metadata(
-            default_fields=["username", "fullname", "isPro"],
             guaranteed_fields=["username"],
             notes="Returns the current authenticated user when a request token is available.",
         ),
@@ -340,7 +339,55 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             max_limit=GRAPH_SCAN_LIMIT_CAP,
             notes="Returns organization member summary rows.",
         ),
-        pagination={"default_return": 1_000, "scan_max": GRAPH_SCAN_LIMIT_CAP},
     ),
     "hf_repo_search": _config(
         "hf_repo_search",
@@ -352,11 +399,12 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             default_limit=20,
             max_limit=5_000,
             notes=(
-                "Cheap summary helper. Uses list-endpoint expansion only; model rows expose "
-                "num_params when upstream metadata provides it."
             ),
         ),
-        pagination={"default_return": 20, "max_return": 5_000},
     ),
     "hf_user_graph": _config(
         "hf_user_graph",
@@ -373,8 +421,8 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             notes="Returns followers/following summary rows.",
         ),
         pagination={
-            "default_return": 1_000,
-            "max_return": GRAPH_SCAN_LIMIT_CAP,
             "scan_max": GRAPH_SCAN_LIMIT_CAP,
         },
     ),
@@ -390,21 +438,13 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             default_limit=1_000,
             notes="Returns users who liked a repo.",
         ),
-        pagination={"default_return": 1_000},
     ),
     "hf_user_likes": _config(
         "hf_user_likes",
         endpoint_patterns=(r"^/api/users/[^/]+/likes$",),
         default_metadata=_metadata(
-            default_fields=[
-                "liked_at",
-                "repo_id",
-                "repo_type",
-                "repo_author",
-                "repo_likes",
-                "repo_downloads",
-                "repo_url",
-            ],
             guaranteed_fields=["liked_at", "repo_id", "repo_type"],
             optional_fields=["repo_author", "repo_likes", "repo_downloads", "repo_url"],
             default_limit=100,
@@ -417,7 +457,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             ),
         ),
         pagination={
-            "default_return": 100,
             "enrich_max": LIKES_ENRICHMENT_MAX_REPOS,
             "ranking_default": LIKES_RANKING_WINDOW_DEFAULT,
             "scan_max": LIKES_SCAN_LIMIT_CAP,
@@ -436,7 +476,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             notes="Activity helper may fetch multiple pages when requested coverage exceeds one page.",
         ),
         pagination={
-            "default_return": 100,
             "max_pages": RECENT_ACTIVITY_SCAN_MAX_PAGES,
             "page_limit": RECENT_ACTIVITY_PAGE_SIZE,
         },
@@ -445,18 +485,9 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
         "hf_repo_discussions",
         endpoint_patterns=(r"^/api/(models|datasets|spaces)/[^/]+/[^/]+/discussions$",),
         default_metadata=_metadata(
-            default_fields=[
-                "num",
-                "title",
-                "author",
-                "status",
-                "createdAt",
-                "repo_id",
-                "repo_type",
-                "url",
-            ],
             guaranteed_fields=["num", "title", "author", "status"],
-            optional_fields=["createdAt", "repo_id", "repo_type", "url"],
             default_limit=20,
             max_limit=200,
             notes="Discussion summary helper.",
@@ -468,39 +499,13 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             r"^/api/(models|datasets|spaces)/[^/]+/[^/]+/discussions/\d+$",
         ),
         default_metadata=_metadata(
-            default_fields=[
-                "number",
-                "discussionNum",
-                "id",
-                "repo_id",
-                "repo_type",
-                "title",
-                "author",
-                "createdAt",
-                "status",
-                "url",
-                "commentCount",
-                "latestCommentAuthor",
-                "latestCommentCreatedAt",
-                "latestCommentText",
-                "latestCommentHtml",
-                "latest_comment_author",
-                "latest_comment_created_at",
-                "latest_comment_text",
-                "latest_comment_html",
-            ],
             guaranteed_fields=["repo_id", "repo_type", "title", "author", "status"],
             optional_fields=[
-                "number",
-                "discussionNum",
-                "id",
-                "createdAt",
                 "url",
-                "commentCount",
-                "latestCommentAuthor",
-                "latestCommentCreatedAt",
-                "latestCommentText",
-                "latestCommentHtml",
                 "latest_comment_author",
                 "latest_comment_created_at",
                 "latest_comment_text",
@@ -537,7 +542,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             max_limit=TRENDING_ENDPOINT_MAX_LIMIT,
             notes="Returns ordered trending summary rows only. Use hf_repo_details for exact repo metadata.",
         ),
-        pagination={"default_return": 20, "max_return": TRENDING_ENDPOINT_MAX_LIMIT},
     ),
     "hf_daily_papers": _config(
         "hf_daily_papers",
@@ -550,7 +555,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Returns daily paper summary rows. repo_id is omitted unless the upstream payload provides it.",
         ),
-        pagination={"default_return": 20, "max_return": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
     "hf_collections_search": _config(
         "hf_collections_search",
@@ -563,7 +568,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Collection summary helper.",
         ),
-        pagination={"default_return": 20, "max_return": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
     "hf_collection_items": _config(
         "hf_collection_items",
@@ -583,7 +588,7 @@ HELPER_CONFIGS: dict[str, HelperConfig] = {
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Returns repos inside one collection as summary rows.",
         ),
-        pagination={"default_return": 100, "max_return": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
 }

     ACTOR_CANONICAL_FIELDS,
     COLLECTION_CANONICAL_FIELDS,
     DAILY_PAPER_CANONICAL_FIELDS,
+    DISCUSSION_CANONICAL_FIELDS,
+    DISCUSSION_DETAIL_CANONICAL_FIELDS,
     GRAPH_SCAN_LIMIT_CAP,
     LIKES_ENRICHMENT_MAX_REPOS,
     LIKES_RANKING_WINDOW_DEFAULT,
     RECENT_ACTIVITY_SCAN_MAX_PAGES,
     REPO_CANONICAL_FIELDS,
     TRENDING_ENDPOINT_MAX_LIMIT,
+    USER_LIKES_CANONICAL_FIELDS,
 )
         "benchmark",
         "dataset_name",
         "expand",
         "full",
         "gated",
         "language",
     "model": {
         "apps",
         "cardData",
         "emissions_thresholds",
         "expand",
         "fetch_config",
         "full",
         "gated",
         "inference",
         "pipeline_tag",
         "trained_dataset",
     },
+    "space": {"datasets", "expand", "full", "linked", "models"},
 }
 REPO_SEARCH_DEFAULT_EXPAND: dict[str, list[str]] = {
     "helpers",
     "helper_defaults",
     "fields",
     "limits",
     "repo_search",
 ]
         "hf_whoami",
         endpoint_patterns=(r"^/api/whoami-v2$",),
         default_metadata=_metadata(
+            default_fields=["username", "fullname", "is_pro"],
             guaranteed_fields=["username"],
             notes="Returns the current authenticated user when a request token is available.",
         ),
             max_limit=GRAPH_SCAN_LIMIT_CAP,
             notes="Returns organization member summary rows.",
         ),
+        pagination={"default_limit": 1_000, "scan_max": GRAPH_SCAN_LIMIT_CAP},
+    ),
+    "hf_models_search": _config(
+        "hf_models_search",
+        endpoint_patterns=(r"^/api/models$",),
+        default_metadata=_metadata(
+            default_fields=REPO_SUMMARY_FIELDS,
+            guaranteed_fields=["repo_id", "repo_type", "author", "repo_url"],
+            optional_fields=REPO_SUMMARY_OPTIONAL_FIELDS,
+            default_limit=20,
+            max_limit=5_000,
+            notes=(
+                "Thin model-search wrapper around the Hub list_models path. Prefer this "
+                "over hf_repo_search for model-only queries."
+            ),
+        ),
+        pagination={"default_limit": 20, "max_limit": 5_000},
+    ),
+    "hf_datasets_search": _config(
+        "hf_datasets_search",
+        endpoint_patterns=(r"^/api/datasets$",),
+        default_metadata=_metadata(
+            default_fields=REPO_SUMMARY_FIELDS,
+            guaranteed_fields=["repo_id", "repo_type", "author", "repo_url"],
+            optional_fields=REPO_SUMMARY_OPTIONAL_FIELDS,
+            default_limit=20,
+            max_limit=5_000,
+            notes=(
+                "Thin dataset-search wrapper around the Hub list_datasets path. Prefer "
+                "this over hf_repo_search for dataset-only queries."
+            ),
+        ),
+        pagination={"default_limit": 20, "max_limit": 5_000},
+    ),
+    "hf_spaces_search": _config(
+        "hf_spaces_search",
+        endpoint_patterns=(r"^/api/spaces$",),
+        default_metadata=_metadata(
+            default_fields=REPO_SUMMARY_FIELDS,
+            guaranteed_fields=["repo_id", "repo_type", "author", "repo_url"],
+            optional_fields=REPO_SUMMARY_OPTIONAL_FIELDS,
+            default_limit=20,
+            max_limit=5_000,
+            notes=(
+                "Thin space-search wrapper around the Hub list_spaces path. Prefer this "
+                "over hf_repo_search for space-only queries."
+            ),
+        ),
+        pagination={"default_limit": 20, "max_limit": 5_000},
     ),
     "hf_repo_search": _config(
         "hf_repo_search",
             default_limit=20,
             max_limit=5_000,
             notes=(
+                "Small generic repo-search helper. Prefer hf_models_search, "
+                "hf_datasets_search, or hf_spaces_search for single-type queries; use "
+                "hf_repo_search for intentionally cross-type search."
             ),
         ),
+        pagination={"default_limit": 20, "max_limit": 5_000},
     ),
     "hf_user_graph": _config(
         "hf_user_graph",
             notes="Returns followers/following summary rows.",
         ),
         pagination={
+            "default_limit": 1_000,
+            "max_limit": GRAPH_SCAN_LIMIT_CAP,
             "scan_max": GRAPH_SCAN_LIMIT_CAP,
         },
     ),
             default_limit=1_000,
             notes="Returns users who liked a repo.",
         ),
+        pagination={"default_limit": 1_000},
     ),
     "hf_user_likes": _config(
         "hf_user_likes",
         endpoint_patterns=(r"^/api/users/[^/]+/likes$",),
         default_metadata=_metadata(
+            default_fields=list(USER_LIKES_CANONICAL_FIELDS),
             guaranteed_fields=["liked_at", "repo_id", "repo_type"],
             optional_fields=["repo_author", "repo_likes", "repo_downloads", "repo_url"],
             default_limit=100,
             ),
         ),
         pagination={
+            "default_limit": 100,
             "enrich_max": LIKES_ENRICHMENT_MAX_REPOS,
             "ranking_default": LIKES_RANKING_WINDOW_DEFAULT,
             "scan_max": LIKES_SCAN_LIMIT_CAP,
             notes="Activity helper may fetch multiple pages when requested coverage exceeds one page.",
         ),
         pagination={
+            "default_limit": 100,
             "max_pages": RECENT_ACTIVITY_SCAN_MAX_PAGES,
             "page_limit": RECENT_ACTIVITY_PAGE_SIZE,
         },
         "hf_repo_discussions",
         endpoint_patterns=(r"^/api/(models|datasets|spaces)/[^/]+/[^/]+/discussions$",),
         default_metadata=_metadata(
+            default_fields=list(DISCUSSION_CANONICAL_FIELDS),
             guaranteed_fields=["num", "title", "author", "status"],
+            optional_fields=["repo_id", "repo_type", "created_at", "url"],
             default_limit=20,
             max_limit=200,
             notes="Discussion summary helper.",
             r"^/api/(models|datasets|spaces)/[^/]+/[^/]+/discussions/\d+$",
         ),
         default_metadata=_metadata(
+            default_fields=list(DISCUSSION_DETAIL_CANONICAL_FIELDS),
             guaranteed_fields=["repo_id", "repo_type", "title", "author", "status"],
             optional_fields=[
+                "num",
+                "created_at",
                 "url",
+                "comment_count",
                 "latest_comment_author",
                 "latest_comment_created_at",
                 "latest_comment_text",
             max_limit=TRENDING_ENDPOINT_MAX_LIMIT,
             notes="Returns ordered trending summary rows only. Use hf_repo_details for exact repo metadata.",
         ),
+        pagination={"default_limit": 20, "max_limit": TRENDING_ENDPOINT_MAX_LIMIT},
     ),
     "hf_daily_papers": _config(
         "hf_daily_papers",
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Returns daily paper summary rows. repo_id is omitted unless the upstream payload provides it.",
         ),
+        pagination={"default_limit": 20, "max_limit": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
     "hf_collections_search": _config(
         "hf_collections_search",
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Collection summary helper.",
         ),
+        pagination={"default_limit": 20, "max_limit": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
     "hf_collection_items": _config(
         "hf_collection_items",
             max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT,
             notes="Returns repos inside one collection as summary rows.",
         ),
+        pagination={"default_limit": 100, "max_limit": OUTPUT_ITEMS_TRUNCATION_LIMIT},
     ),
 }

monty_api/runtime_context.py CHANGED Viewed

@@ -60,6 +60,8 @@ from .runtime_filtering import (
     _project_activity_items,
     _project_actor_items,
     _project_collection_items,
     _project_daily_paper_items,
     _project_items,
     _project_repo_items,
@@ -215,6 +217,8 @@ for name, value in {
     "_project_items": _project_items,
     "_project_repo_items": _project_repo_items,
     "_project_collection_items": _project_collection_items,
     "_project_daily_paper_items": _project_daily_paper_items,
     "_project_user_items": _project_user_items,
     "_project_actor_items": _project_actor_items,

     _project_activity_items,
     _project_actor_items,
     _project_collection_items,
+    _project_discussion_detail_items,
+    _project_discussion_items,
     _project_daily_paper_items,
     _project_items,
     _project_repo_items,
     "_project_items": _project_items,
     "_project_repo_items": _project_repo_items,
     "_project_collection_items": _project_collection_items,
+    "_project_discussion_items": _project_discussion_items,
+    "_project_discussion_detail_items": _project_discussion_detail_items,
     "_project_daily_paper_items": _project_daily_paper_items,
     "_project_user_items": _project_user_items,
     "_project_actor_items": _project_actor_items,

monty_api/runtime_envelopes.py CHANGED Viewed

@@ -21,8 +21,8 @@ def _helper_meta(
 def _derive_limit_metadata(
     self: Any,
     *,
-    requested_return_limit: int | None,
-    applied_return_limit: int,
     default_limit_used: bool,
     requested_scan_limit: int | None = None,
     applied_scan_limit: int | None = None,
@@ -30,8 +30,8 @@ def _derive_limit_metadata(
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
     meta: dict[str, Any] = {
-        "requested_return_limit": requested_return_limit,
-        "applied_return_limit": applied_return_limit,
         "default_limit_used": default_limit_used,
     }
     if requested_scan_limit is not None or applied_scan_limit is not None:
@@ -42,8 +42,8 @@ def _derive_limit_metadata(
         meta["requested_max_pages"] = requested_max_pages
         meta["applied_max_pages"] = applied_max_pages
         meta["page_limit_applied"] = requested_max_pages != applied_max_pages
-    if requested_return_limit is not None:
-        meta["hard_cap_applied"] = applied_return_limit < requested_return_limit
     return meta
@@ -68,9 +68,9 @@ def _derive_truncated_by(
     hard_cap: bool = False,
     scan_limit_hit: bool = False,
     page_limit_hit: bool = False,
-    return_limit_hit: bool = False,
 ) -> str:
-    causes = [hard_cap, scan_limit_hit, page_limit_hit, return_limit_hit]
     if sum(1 for cause in causes if cause) > 1:
         return "multiple"
     if hard_cap:
@@ -79,8 +79,8 @@ def _derive_truncated_by(
         return "scan_limit"
     if page_limit_hit:
         return "page_limit"
-    if return_limit_hit:
-        return "return_limit"
     return "none"
@@ -89,7 +89,7 @@ def _derive_can_request_more(
 ) -> bool:
     if sample_complete:
         return False
-    return truncated_by in {"return_limit", "scan_limit", "page_limit", "multiple"}
 def _derive_next_request_hint(
@@ -97,12 +97,12 @@ def _derive_next_request_hint(
     *,
     truncated_by: str,
     more_available: bool | str,
-    applied_return_limit: int,
     applied_scan_limit: int | None = None,
     applied_max_pages: int | None = None,
 ) -> str:
-    if truncated_by == "return_limit":
-        return f"Ask for return_limit>{applied_return_limit} to see more rows"
     if truncated_by == "scan_limit" and applied_scan_limit is not None:
         return f"Increase scan_limit above {applied_scan_limit} for broader coverage"
     if truncated_by == "page_limit" and applied_max_pages is not None:
@@ -121,28 +121,27 @@ def _derive_next_request_hint(
 def _resolve_exhaustive_limits(
     self: Any,
     *,
-    return_limit: int | None,
     count_only: bool,
-    default_return: int,
-    max_return: int,
     scan_limit: int | None = None,
     scan_cap: int | None = None,
 ) -> dict[str, Any]:
-    requested_return_limit = None if count_only else return_limit
-    effective_requested_return_limit = 0 if count_only else requested_return_limit
     out: dict[str, Any] = {
-        "requested_return_limit": requested_return_limit,
-        "applied_return_limit": _clamp_int(
-            effective_requested_return_limit,
-            default=default_return,
             minimum=0,
-            maximum=max_return,
         ),
-        "default_limit_used": requested_return_limit is None and not count_only,
     }
     out["hard_cap_applied"] = (
-        requested_return_limit is not None
-        and out["applied_return_limit"] < requested_return_limit
     )
     if scan_cap is not None:
         out["requested_scan_limit"] = scan_limit
@@ -168,7 +167,7 @@ def _build_exhaustive_meta(
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
     meta = dict(base_meta)
-    applied_return_limit = int(limit_plan["applied_return_limit"])
     applied_scan_limit = limit_plan.get("applied_scan_limit")
     meta.update(
         {
@@ -186,7 +185,7 @@ def _build_exhaustive_meta(
                 self,
                 truncated_by=truncated_by,
                 more_available=more_available,
-                applied_return_limit=applied_return_limit,
                 applied_scan_limit=applied_scan_limit
                 if isinstance(applied_scan_limit, int)
                 else None,
@@ -197,8 +196,8 @@ def _build_exhaustive_meta(
     meta.update(
         _derive_limit_metadata(
             self,
-            requested_return_limit=limit_plan["requested_return_limit"],
-            applied_return_limit=applied_return_limit,
             default_limit_used=bool(limit_plan["default_limit_used"]),
             requested_scan_limit=limit_plan.get("requested_scan_limit"),
             applied_scan_limit=applied_scan_limit
@@ -263,26 +262,26 @@ def _build_exhaustive_result_meta(
     requested_max_pages: int | None = None,
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
-    applied_return_limit = int(limit_plan["applied_return_limit"])
     if count_only:
         effective_sample_complete = exact_count
     else:
         effective_sample_complete = (
             sample_complete
             if isinstance(sample_complete, bool)
-            else exact_count and matched_count <= applied_return_limit
         )
-    return_limit_hit = (
         False
         if count_only
-        else (applied_return_limit > 0 and matched_count > applied_return_limit)
     )
     truncated_by = _derive_truncated_by(
         self,
         hard_cap=bool(limit_plan.get("hard_cap_applied")),
         scan_limit_hit=scan_limit_hit,
         page_limit_hit=page_limit_hit,
-        return_limit_hit=return_limit_hit,
     )
     truncated = truncated_by != "none" or truncated_extra
     total_value = _as_int(base_meta.get("total"))

 def _derive_limit_metadata(
     self: Any,
     *,
+    requested_limit: int | None,
+    applied_limit: int,
     default_limit_used: bool,
     requested_scan_limit: int | None = None,
     applied_scan_limit: int | None = None,
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
     meta: dict[str, Any] = {
+        "requested_limit": requested_limit,
+        "applied_limit": applied_limit,
         "default_limit_used": default_limit_used,
     }
     if requested_scan_limit is not None or applied_scan_limit is not None:
         meta["requested_max_pages"] = requested_max_pages
         meta["applied_max_pages"] = applied_max_pages
         meta["page_limit_applied"] = requested_max_pages != applied_max_pages
+    if requested_limit is not None:
+        meta["hard_cap_applied"] = applied_limit < requested_limit
     return meta
     hard_cap: bool = False,
     scan_limit_hit: bool = False,
     page_limit_hit: bool = False,
+    limit_hit: bool = False,
 ) -> str:
+    causes = [hard_cap, scan_limit_hit, page_limit_hit, limit_hit]
     if sum(1 for cause in causes if cause) > 1:
         return "multiple"
     if hard_cap:
         return "scan_limit"
     if page_limit_hit:
         return "page_limit"
+    if limit_hit:
+        return "limit"
     return "none"
 ) -> bool:
     if sample_complete:
         return False
+    return truncated_by in {"limit", "scan_limit", "page_limit", "multiple"}
 def _derive_next_request_hint(
     *,
     truncated_by: str,
     more_available: bool | str,
+    applied_limit: int,
     applied_scan_limit: int | None = None,
     applied_max_pages: int | None = None,
 ) -> str:
+    if truncated_by == "limit":
+        return f"Ask for limit>{applied_limit} to see more rows"
     if truncated_by == "scan_limit" and applied_scan_limit is not None:
         return f"Increase scan_limit above {applied_scan_limit} for broader coverage"
     if truncated_by == "page_limit" and applied_max_pages is not None:
 def _resolve_exhaustive_limits(
     self: Any,
     *,
+    limit: int | None,
     count_only: bool,
+    default_limit: int,
+    max_limit: int,
     scan_limit: int | None = None,
     scan_cap: int | None = None,
 ) -> dict[str, Any]:
+    requested_limit = None if count_only else limit
+    effective_requested_limit = 0 if count_only else requested_limit
     out: dict[str, Any] = {
+        "requested_limit": requested_limit,
+        "applied_limit": _clamp_int(
+            effective_requested_limit,
+            default=default_limit,
             minimum=0,
+            maximum=max_limit,
         ),
+        "default_limit_used": requested_limit is None and not count_only,
     }
     out["hard_cap_applied"] = (
+        requested_limit is not None and out["applied_limit"] < requested_limit
     )
     if scan_cap is not None:
         out["requested_scan_limit"] = scan_limit
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
     meta = dict(base_meta)
+    applied_limit = int(limit_plan["applied_limit"])
     applied_scan_limit = limit_plan.get("applied_scan_limit")
     meta.update(
         {
                 self,
                 truncated_by=truncated_by,
                 more_available=more_available,
+                applied_limit=applied_limit,
                 applied_scan_limit=applied_scan_limit
                 if isinstance(applied_scan_limit, int)
                 else None,
     meta.update(
         _derive_limit_metadata(
             self,
+            requested_limit=limit_plan["requested_limit"],
+            applied_limit=applied_limit,
             default_limit_used=bool(limit_plan["default_limit_used"]),
             requested_scan_limit=limit_plan.get("requested_scan_limit"),
             applied_scan_limit=applied_scan_limit
     requested_max_pages: int | None = None,
     applied_max_pages: int | None = None,
 ) -> dict[str, Any]:
+    applied_limit = int(limit_plan["applied_limit"])
     if count_only:
         effective_sample_complete = exact_count
     else:
         effective_sample_complete = (
             sample_complete
             if isinstance(sample_complete, bool)
+            else exact_count and matched_count <= applied_limit
         )
+    limit_hit = (
         False
         if count_only
+        else (applied_limit > 0 and matched_count > applied_limit)
     )
     truncated_by = _derive_truncated_by(
         self,
         hard_cap=bool(limit_plan.get("hard_cap_applied")),
         scan_limit_hit=scan_limit_hit,
         page_limit_hit=page_limit_hit,
+        limit_hit=limit_hit,
     )
     truncated = truncated_by != "none" or truncated_extra
     total_value = _as_int(base_meta.get("total"))

monty_api/runtime_filtering.py CHANGED Viewed

@@ -2,40 +2,48 @@ from __future__ import annotations
 from typing import Any
-from .aliases import (
-    ACTIVITY_FIELD_ALIASES,
-    ACTOR_FIELD_ALIASES,
-    COLLECTION_FIELD_ALIASES,
-    DAILY_PAPER_FIELD_ALIASES,
-    REPO_FIELD_ALIASES,
-    USER_FIELD_ALIASES,
-    USER_LIKES_FIELD_ALIASES,
 )
 from .http_runtime import _as_int
 def _project_items(
     self: Any,
     items: list[dict[str, Any]],
     fields: list[str] | None,
-    aliases: dict[str, str] | None = None,
 ) -> list[dict[str, Any]]:
     if not isinstance(fields, list) or not fields:
         return items
     wanted = [str(field).strip() for field in fields if str(field).strip()]
     if not wanted:
         return items
-    alias_map = {
-        str(key).strip().lower(): str(value).strip()
-        for key, value in (aliases or {}).items()
-        if str(key).strip() and str(value).strip()
-    }
     projected: list[dict[str, Any]] = []
     for row in items:
         out: dict[str, Any] = {}
         for key in wanted:
-            source_key = alias_map.get(key.lower(), key)
-            value = row.get(source_key)
             if value is None:
                 continue
             out[key] = value
@@ -46,63 +54,88 @@ def _project_items(
 def _project_repo_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=REPO_FIELD_ALIASES)
 def _project_collection_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=COLLECTION_FIELD_ALIASES)
 def _project_daily_paper_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=DAILY_PAPER_FIELD_ALIASES)
 def _project_user_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=USER_FIELD_ALIASES)
 def _project_actor_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=ACTOR_FIELD_ALIASES)
 def _project_user_like_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=USER_LIKES_FIELD_ALIASES)
 def _project_activity_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
-    return _project_items(self, items, fields, aliases=ACTIVITY_FIELD_ALIASES)
 def _normalize_where(
     self: Any,
     where: dict[str, Any] | None,
-    aliases: dict[str, str] | None = None,
 ) -> dict[str, Any] | None:
     if not isinstance(where, dict) or not where:
         return where
-    alias_map = {
-        str(key).strip().lower(): str(value).strip()
-        for key, value in (aliases or {}).items()
-        if str(key).strip() and str(value).strip()
-    }
     normalized: dict[str, Any] = {}
     for key, value in where.items():
         raw_key = str(key).strip()
         if not raw_key:
             continue
-        normalized[alias_map.get(raw_key.lower(), raw_key)] = value
     return normalized
@@ -161,9 +194,9 @@ def _apply_where(
     items: list[dict[str, Any]],
     where: dict[str, Any] | None,
     *,
-    aliases: dict[str, str] | None = None,
 ) -> list[dict[str, Any]]:
-    normalized_where = _normalize_where(self, where, aliases=aliases)
     if not isinstance(normalized_where, dict) or not normalized_where:
         return items
     return [row for row in items if _item_matches_where(self, row, normalized_where)]

 from typing import Any
+from .constants import (
+    ACTIVITY_CANONICAL_FIELDS,
+    ACTOR_CANONICAL_FIELDS,
+    COLLECTION_CANONICAL_FIELDS,
+    DAILY_PAPER_CANONICAL_FIELDS,
+    DISCUSSION_CANONICAL_FIELDS,
+    DISCUSSION_DETAIL_CANONICAL_FIELDS,
+    REPO_CANONICAL_FIELDS,
+    USER_CANONICAL_FIELDS,
+    USER_LIKES_CANONICAL_FIELDS,
 )
 from .http_runtime import _as_int
+def _allowed_field_set(allowed_fields: tuple[str, ...] | list[str] | set[str]) -> set[str]:
+    return {str(field).strip() for field in allowed_fields if str(field).strip()}
 def _project_items(
     self: Any,
     items: list[dict[str, Any]],
     fields: list[str] | None,
+    *,
+    allowed_fields: tuple[str, ...] | list[str] | set[str] | None = None,
 ) -> list[dict[str, Any]]:
     if not isinstance(fields, list) or not fields:
         return items
     wanted = [str(field).strip() for field in fields if str(field).strip()]
     if not wanted:
         return items
+    if allowed_fields is not None:
+        allowed = _allowed_field_set(allowed_fields)
+        invalid = sorted(field for field in wanted if field not in allowed)
+        if invalid:
+            raise ValueError(
+                f"Unsupported fields {invalid}. Allowed fields: {sorted(allowed)}"
+            )
     projected: list[dict[str, Any]] = []
     for row in items:
         out: dict[str, Any] = {}
         for key in wanted:
+            value = row.get(key)
             if value is None:
                 continue
             out[key] = value
 def _project_repo_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(self, items, fields, allowed_fields=REPO_CANONICAL_FIELDS)
 def _project_collection_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=COLLECTION_CANONICAL_FIELDS
+    )
 def _project_daily_paper_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=DAILY_PAPER_CANONICAL_FIELDS
+    )
 def _project_user_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(self, items, fields, allowed_fields=USER_CANONICAL_FIELDS)
 def _project_actor_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(self, items, fields, allowed_fields=ACTOR_CANONICAL_FIELDS)
 def _project_user_like_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=USER_LIKES_CANONICAL_FIELDS
+    )
 def _project_activity_items(
     self: Any, items: list[dict[str, Any]], fields: list[str] | None
 ) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=ACTIVITY_CANONICAL_FIELDS
+    )
+def _project_discussion_items(
+    self: Any, items: list[dict[str, Any]], fields: list[str] | None
+) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=DISCUSSION_CANONICAL_FIELDS
+    )
+def _project_discussion_detail_items(
+    self: Any, items: list[dict[str, Any]], fields: list[str] | None
+) -> list[dict[str, Any]]:
+    return _project_items(
+        self, items, fields, allowed_fields=DISCUSSION_DETAIL_CANONICAL_FIELDS
+    )
 def _normalize_where(
     self: Any,
     where: dict[str, Any] | None,
+    *,
+    allowed_fields: tuple[str, ...] | list[str] | set[str] | None = None,
 ) -> dict[str, Any] | None:
     if not isinstance(where, dict) or not where:
         return where
+    allowed = _allowed_field_set(allowed_fields) if allowed_fields is not None else None
     normalized: dict[str, Any] = {}
     for key, value in where.items():
         raw_key = str(key).strip()
         if not raw_key:
             continue
+        if allowed is not None and raw_key not in allowed:
+            raise ValueError(
+                f"Unsupported filter fields {[raw_key]}. Allowed fields: {sorted(allowed)}"
+            )
+        normalized[raw_key] = value
     return normalized
     items: list[dict[str, Any]],
     where: dict[str, Any] | None,
     *,
+    allowed_fields: tuple[str, ...] | list[str] | set[str] | None = None,
 ) -> list[dict[str, Any]]:
+    normalized_where = _normalize_where(self, where, allowed_fields=allowed_fields)
     if not isinstance(normalized_where, dict) or not normalized_where:
         return items
     return [row for row in items if _item_matches_where(self, row, normalized_where)]

monty_api/validation.py CHANGED Viewed

@@ -155,8 +155,8 @@ def _summarize_limit_hit(helper_name: str, result: Any) -> dict[str, Any] | None
         "truncated": meta.get("truncated"),
         "truncated_by": meta.get("truncated_by"),
         "more_available": meta.get("more_available"),
-        "requested_return_limit": meta.get("requested_return_limit"),
-        "applied_return_limit": meta.get("applied_return_limit"),
         "next_request_hint": meta.get("next_request_hint"),
     }
     if meta.get("scan_limit") is not None:

         "truncated": meta.get("truncated"),
         "truncated_by": meta.get("truncated_by"),
         "more_available": meta.get("more_available"),
+        "requested_limit": meta.get("requested_limit"),
+        "applied_limit": meta.get("applied_limit"),
         "next_request_hint": meta.get("next_request_hint"),
     }
     if meta.get("scan_limit") is not None: