diff --git a/_monty_codegen_shared.md b/_monty_codegen_shared.md index 687217c5aeed038f28123155586d66ce8da2cd6f..a0614fcac9ab66a2fdcbf8d49cd1a97dd8a01738 100644 --- a/_monty_codegen_shared.md +++ b/_monty_codegen_shared.md @@ -3,24 +3,31 @@ - You are writing Python to be executed in a secure runtime environment. - **NEVER** use `import` - it is NOT available in this environment. - All helper calls are async: always use `await`. -- Use this exact outer shape: +- Write a top-level Monty Python script. Use a shape like: ```py -async def solve(query, max_calls): - ... - -await solve(query, max_calls) +resp = await hf_models_search(limit=min(max_calls, 10)) +result = resp["items"] +result ``` +- `max_calls` is a runtime-provided top-level input. - `max_calls` is the total external-call budget for the whole program. +- Always assign the final output to `result`. +- End the script with a final line containing only `result`. +- Never stop after `result = ...`; always add a final bare `result` line. +- Do **not** define or call `solve(...)`. - Use only documented `hf_*` helpers. -- Return plain Python data only: `dict`, `list`, `str`, `int`, `float`, `bool`, or `None`. -- Do **not** hand-build JSON strings or markdown strings inside `solve(...)` unless the user explicitly asked for prose. -- Do **not** build your own transport wrapper like `{result: ..., meta: ...}`. -- If the user says "return only" some fields, return exactly that final shape. -- If a helper already returns the requested row shape, return `resp["items"]` directly **only when helper coverage is clearly complete**. If helper `meta` suggests partial/unknown coverage, return `{"results": resp["items"], "coverage": resp["meta"]}` instead of bare items. +- `result` must be plain Python data only: `dict`, `list`, `str`, `int`, `float`, `bool`, or `None`. +- Do **not** hand-build JSON strings, markdown strings, or your own transport wrapper like `{result: ..., meta: ...}` unless the user explicitly asked for prose. +- If the user says "return only" some fields, make `result` exactly that shape. +- If a helper already returns the requested row shape, use `resp["items"]` directly **only when helper coverage is clearly complete**. If helper `meta` suggests partial/unknown coverage, set `result = {"results": resp["items"], "coverage": resp["meta"]}` instead of bare items. - For current-user prompts (`my`, `me`), try helpers with `username=None` / `handle=None` first. -- If a current-user helper returns `ok=false`, return that helper response directly. +- For current-user follower/following aggregation prompts, prefer `hf_user_graph(relation=..., ...)` directly instead of `hf_whoami()` plus a second graph call. This saves a call and avoids unnecessary branching. +- If a current-user helper returns `ok=false`, assign that helper response to `result`. +- For relationship / aggregation questions (followers, members, likes, likers, intersections), preserve attribution in `result` unless the user explicitly asked for a collapsed deduped list. +- Do **not** choose tiny hard-coded limits like `5` for follower/member/likes aggregation unless the user explicitly asked for a tiny sample. Prefer larger limits and preserve coverage when partial. +- If you branch on an error path, you must still end the module with a final top-level bare `result` line outside every `if` / loop. ## Search rules @@ -41,35 +48,81 @@ await solve(query, max_calls) - `hf_user_likes(...)` already returns full normalized like rows by default; omit `fields` unless the user asked for a subset. - When sorting `hf_user_likes(...)` by `repo_likes` or `repo_downloads`, set `ranking_window=50` unless the user explicitly asked for a narrower recent window. - For human-facing follower/member/liker lists without an explicit requested count, prefer `limit=100` and return coverage when more may exist. +- For follower/following/member/liker queries that require local filtering on actor fields such as `username` or `fullname`, prefer a bounded scan like `limit=100` / `scan_limit=100` by default, or at most about `200` when a slightly broader sample is justified. Do **not** jump to `1000` unless the user explicitly asked for exhaustive coverage or a very large sample. - Unknown `fields` / `where` keys now fail fast. Use only canonical field names. - - Ownership phrasing like "what collections does Qwen have", "collections by Qwen", or "collections owned by Qwen" means an owner lookup, so use `hf_collections_search(owner="Qwen")`, not a keyword-only `query="Qwen"` search. +- `hf_collections_search(owner=...)` filters owners case-insensitively, so preserve the user-provided owner spelling but use the owner argument directly. - Ownership phrasing like "what spaces does X have", "what models does X have", or "what datasets does X have" means an author/owner inventory lookup, so use `hf_spaces_search(author="X")`, `hf_models_search(author="X")`, or `hf_datasets_search(author="X")` rather than a global keyword-only search. -- Owner/user/org handles may arrive with different casing in the user message; when a handle spelling is uncertain, prefer owner-oriented logic and, if needed, add fallback inside `solve(...)` that broadens to `query=...` and filters owners case-insensitively. +- For paper discovery, use `hf_papers_search(...)` for search, `hf_daily_papers(...)` for the curated daily feed, `hf_paper_info(...)` for exact metadata, and `hf_read_paper(...)` for markdown content. +- The main Hub-native join points on paper rows are `organization`, `submitted_by`, and `author_usernames`. Papers do not expose first-class model/dataset/space repo IDs. +- For profile/detail/social questions about a user or org — bio, description, display name, website, GitHub, Twitter/X, LinkedIn, Bluesky, organizations, or pro status — use `hf_profile_summary(...)` first. +- For join-style questions that need profile details for followers, following, members, likers, or other actor lists, first fetch a **bounded** actor list, filter locally on actor fields like `username` / `fullname`, then hydrate only the bounded matches with `hf_profile_summary(...)`. +- Do **not** set the initial actor-list limit equal to the whole remaining call budget when each match needs a follow-up profile lookup; reserve budget for the profile-detail calls and return coverage if the hydration step is partial. - For exact aggregate counts like "how many models/datasets/spaces does X have", prefer `hf_profile_summary(...)['item']` counts. Those overview-owned counts may differ slightly from visible public search/list results, so if the user also asked for the list, preserve that distinction. - For owner inventory queries without an explicit requested count, use `hf_profile_summary(...)` first when a specific owner is known. If the count is modest, use it to size the follow-up list call; otherwise return a bounded list plus coverage instead of pretending completeness. - Think like `huggingface_hub`: `search`, `filter`, `author`, repo-type-specific upstream params, then `fields`. - Push constraints upstream whenever a first-class helper argument exists. - `post_filter` is only for normalized row filters that cannot be pushed upstream. +- For created/updated date constraints, pair local `post_filter` with the matching sort (`created_at` or `last_modified`). Do **not** rely on date-only `post_filter` over an unsorted repo search window. - Keep `post_filter` simple: - exact match or `in` for returned fields like `runtime_stage` - `gte` / `lte` for normalized numeric fields like `num_params`, `downloads`, and `likes` + - `gte` / `lte` also work for normalized ISO timestamp fields like `created_at` and `last_modified` - `num_params` is one of the main valid reasons to use `post_filter` on model search today. - Do **not** use `post_filter` for things that already have first-class upstream params like `author`, `pipeline_tag`, `dataset_name`, `language`, `models`, or `datasets`. +## Common repo fields + +- `repo_id` +- `repo_type` +- `author` +- `likes` +- `downloads` +- `created_at` +- `last_modified` +- `num_params` +- `repo_url` +- model: `library_name`, `pipeline_tag` +- dataset: `description`, `paperswithcode_id` +- space: `sdk`, `models`, `datasets`, `subdomain` + +## Common collection fields + +- `collection_id` +- `title` +- `owner` +- `description` +- `last_updated` +- `item_count` +- use `hf_collections_search(owner="", ...)` for owner lookups + +## Common paper join points + +- `organization` +- `submitted_by` +- `author_usernames` +- `discussion_id` + Examples: ```py -await hf_models_search(pipeline_tag="text-to-image", limit=10) -await hf_datasets_search(search="speech", sort="downloads", limit=10) -await hf_spaces_search(post_filter={"runtime_stage": {"in": ["BUILD_ERROR", "RUNTIME_ERROR"]}}) -await hf_models_search( +result = await hf_models_search(pipeline_tag="text-to-image", limit=10) +result +``` + +```py +result = await hf_models_search( pipeline_tag="text-generation", sort="trending_score", limit=50, post_filter={"num_params": {"gte": 20_000_000_000, "lte": 80_000_000_000}}, ) -await hf_collections_search(owner="Qwen", limit=10) +result +``` + +```py +result = await hf_collections_search(owner="Qwen", limit=10) +result ``` Field-only pattern: @@ -80,7 +133,8 @@ resp = await hf_models_search( fields=["repo_id", "author", "likes", "downloads", "repo_url"], limit=3, ) -return resp["items"] +result = resp["items"] +result ``` Coverage pattern: @@ -93,7 +147,8 @@ resp = await hf_user_likes( limit=20, fields=["repo_id", "repo_likes", "repo_url"], ) -return {"results": resp["items"], "coverage": resp["meta"]} +result = {"results": resp["items"], "coverage": resp["meta"]} +result ``` Owner-inventory pattern: @@ -109,33 +164,64 @@ resp = await hf_spaces_search( ) meta = resp.get("meta") or {} if meta.get("limit_boundary_hit") or meta.get("more_available") not in {False, None}: - return {"results": resp["items"], "coverage": {**meta, "profile_spaces_count": count}} -return resp["items"] + result = {"results": resp["items"], "coverage": {**meta, "profile_spaces_count": count}} +else: + result = resp["items"] +result ``` -Profile-count pattern: +Bounded join pattern: ```py -profile = await hf_profile_summary(handle="mishig") -item = profile["item"] or {} -return { - "followers_count": item.get("followers_count"), - "following_count": item.get("following_count"), -} -``` - -Pro-followers pattern: - -```py -followers = await hf_user_graph( +followers_resp = await hf_user_graph( relation="followers", - pro_only=True, - limit=20, - fields=["username"], + limit=100, + scan_limit=100, + fields=["username", "fullname"], ) -return followers["items"] +followers = followers_resp.get("items") or [] +matches = [] +for follower in followers: + username = follower.get("username") + fullname = follower.get("fullname") + starts_with_b = ( + (isinstance(username, str) and username.lower().startswith("b")) + or (isinstance(fullname, str) and fullname.lower().startswith("b")) + ) + if starts_with_b: + matches.append(follower) +remaining_profile_calls = max(0, max_calls - 1) +results = [] +for follower in matches[:remaining_profile_calls]: + username = follower.get("username") + if not username: + continue + profile = await hf_profile_summary(handle=username) + item = profile.get("item") or {} + results.append( + { + "username": username, + "fullname": follower.get("fullname"), + "github_url": item.get("github_url"), + } + ) +result = { + "results": results, + "coverage": { + "followers": followers_resp.get("meta") or {}, + "matching_followers_seen": len(matches), + "profile_calls_used": len(results), + "profile_hydration_partial": len(matches) > len(results), + }, +} +result ``` +Use the same pattern for other bounded joins: +- actor list → filter locally → hydrate exact matches +- actor list → per-actor likes/details → aggregate under `results` +- preserve upstream helper `meta` under top-level `coverage` whenever partiality matters + ## Navigation graph Use the helper that matches the question type. @@ -146,11 +232,14 @@ Use the helper that matches the question type. - space search/list/discovery → `hf_spaces_search(...)` - cross-type repo search → `hf_repo_search(...)` - trending repos → `hf_trending(...)` -- daily papers → `hf_daily_papers(...)` +- Daily papers → `hf_daily_papers(...)` +- paper search → `hf_papers_search(...)` +- paper detail → `hf_paper_info(...)` +- paper markdown → `hf_read_paper(...)` - repo discussions → `hf_repo_discussions(...)` - specific discussion details → `hf_repo_discussion_details(...)` - users who liked one repo → `hf_repo_likers(...)` -- profile / overview / aggregate counts → `hf_profile_summary(...)` +- profile / overview / social/detail / aggregate counts → `hf_profile_summary(...)` - followers / following lists → `hf_user_graph(...)` - repos a user liked → `hf_user_likes(...)` - recent activity feed → `hf_recent_activity(...)` @@ -182,16 +271,12 @@ Rules: - `items` is the canonical list field. - `item` is just a singleton convenience. - `meta` contains helper-owned execution, limit, and coverage info. -- When helper-owned coverage matters, prefer returning the helper envelope directly. ## High-signal output rules - Prefer compact dict/list outputs over prose when the user asked for fields. -- Prefer summary helpers before detail hydration. - Use canonical snake_case keys in generated code and structured output. - Use `repo_id` as the display label for repos. -- Use `hf_profile_summary(...)['item']` for aggregate counts such as followers, following, models, datasets, and spaces. -- For selective one-shot search helpers, treat `meta.limit_boundary_hit=true` as a partial/unknown-coverage warning even if `meta.truncated` is still `false`. - For joins/intersections/rankings, fetch the needed working set first and compute locally. - If the result is partial, use top-level keys `results` and `coverage`. @@ -205,7 +290,7 @@ await hf_collection_items(collection_id: 'str', repo_types: 'list[str] | None' = await hf_collections_search(query: 'str | None' = None, owner: 'str | None' = None, limit: 'int' = 20, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' -await hf_daily_papers(limit: 'int' = 20, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' +await hf_daily_papers(date: 'str | None' = None, week: 'str | None' = None, month: 'str | None' = None, submitter: 'str | None' = None, sort: 'str | None' = None, p: 'int | None' = None, limit: 'int' = 20, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' await hf_datasets_search(search: 'str | None' = None, filter: 'str | list[str] | None' = None, author: 'str | None' = None, benchmark: 'str | bool | None' = None, dataset_name: 'str | None' = None, gated: 'bool | None' = None, language_creators: 'str | list[str] | None' = None, language: 'str | list[str] | None' = None, multilinguality: 'str | list[str] | None' = None, size_categories: 'str | list[str] | None' = None, task_categories: 'str | list[str] | None' = None, task_ids: 'str | list[str] | None' = None, sort: 'str | None' = None, limit: 'int' = 20, expand: 'list[str] | None' = None, full: 'bool | None' = None, fields: 'list[str] | None' = None, post_filter: 'dict[str, Any] | None' = None) -> 'dict[str, Any]' @@ -213,8 +298,14 @@ await hf_models_search(search: 'str | None' = None, filter: 'str | list[str] | N await hf_org_members(organization: 'str', limit: 'int | None' = None, scan_limit: 'int | None' = None, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' +await hf_paper_info(paper_id: 'str', fields: 'list[str] | None' = None) -> 'dict[str, Any]' + +await hf_papers_search(query: 'str', limit: 'int' = 20, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' + await hf_profile_summary(handle: 'str | None' = None, include: 'list[str] | None' = None, likes_limit: 'int' = 10, activity_limit: 'int' = 10) -> 'dict[str, Any]' +await hf_read_paper(paper_id: 'str') -> 'dict[str, Any]' + await hf_recent_activity(feed_type: 'str | None' = None, entity: 'str | None' = None, activity_types: 'list[str] | None' = None, repo_types: 'list[str] | None' = None, limit: 'int | None' = None, max_pages: 'int | None' = None, start_cursor: 'str | None' = None, count_only: 'bool' = False, where: 'dict[str, Any] | None' = None, fields: 'list[str] | None' = None) -> 'dict[str, Any]' await hf_repo_details(repo_id: 'str | None' = None, repo_ids: 'list[str] | None' = None, repo_type: 'str' = 'auto', fields: 'list[str] | None' = None) -> 'dict[str, Any]' @@ -296,24 +387,27 @@ All helpers return the same envelope: `{ok, item, items, meta, error}`. ### hf_daily_papers - category: `curated_feed` +- backed_by: `HfApi.list_daily_papers` - returns: - envelope: `{ok, item, items, meta, error}` - - row_type: `daily_paper` - - default_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_on_daily_at`, `authors`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `github_repo_url`, `github_stars`, `project_page_url`, `num_comments`, `is_author_participating`, `repo_id`, `rank` - - guaranteed_fields: `paper_id`, `title`, `published_at`, `rank` - - optional_fields: `summary`, `submitted_on_daily_at`, `authors`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `github_repo_url`, `github_stars`, `project_page_url`, `num_comments`, `is_author_participating`, `repo_id` -- supported_params: `limit`, `where`, `fields` + - row_type: `paper` + - default_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - guaranteed_fields: `paper_id`, `title`, `published_at` + - optional_fields: `summary`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` +- supported_params: `date`, `week`, `month`, `submitter`, `sort`, `p`, `limit`, `where`, `fields` +- param_values: + - sort: `published_at`, `trending` - fields_contract: - - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_on_daily_at`, `authors`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `github_repo_url`, `github_stars`, `project_page_url`, `num_comments`, `is_author_participating`, `repo_id`, `rank` + - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` - canonical_only: `true` - where_contract: - - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_on_daily_at`, `authors`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `github_repo_url`, `github_stars`, `project_page_url`, `num_comments`, `is_author_participating`, `repo_id`, `rank` + - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` - supported_ops: `eq`, `in`, `contains`, `icontains`, `gte`, `lte` - normalized_only: `true` - limit_contract: - default_limit: `20` - max_limit: `500` -- notes: Returns daily paper summary rows. repo_id is omitted unless the upstream payload provides it. +- notes: Curated daily papers feed backed by HfApi.list_daily_papers. Useful join points: organization, submitted_by, author_usernames, discussion_id. ### hf_datasets_search @@ -388,6 +482,45 @@ All helpers return the same envelope: `{ok, item, items, meta, error}`. - scan_max: `10000` - notes: Returns organization member summary rows. +### hf_paper_info + +- category: `paper_detail` +- backed_by: `HfApi.paper_info` +- returns: + - envelope: `{ok, item, items, meta, error}` + - row_type: `paper` + - default_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - guaranteed_fields: `paper_id`, `title`, `published_at` + - optional_fields: `summary`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` +- supported_params: `paper_id`, `fields` +- fields_contract: + - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - canonical_only: `true` +- notes: Exact paper metadata helper backed by HfApi.paper_info. + +### hf_papers_search + +- category: `paper_search` +- backed_by: `HfApi.list_papers` +- returns: + - envelope: `{ok, item, items, meta, error}` + - row_type: `paper` + - default_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - guaranteed_fields: `paper_id`, `title`, `published_at` + - optional_fields: `summary`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` +- supported_params: `query`, `limit`, `where`, `fields` +- fields_contract: + - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - canonical_only: `true` +- where_contract: + - allowed_fields: `paper_id`, `title`, `summary`, `published_at`, `submitted_at`, `authors`, `author_usernames`, `organization`, `submitted_by`, `discussion_id`, `upvotes`, `source`, `comments`, `project_page`, `github_repo`, `github_stars`, `rank` + - supported_ops: `eq`, `in`, `contains`, `icontains`, `gte`, `lte` + - normalized_only: `true` +- limit_contract: + - default_limit: `20` + - max_limit: `500` +- notes: Paper search helper backed by HfApi.list_papers. Use organization, submitted_by, and author_usernames as the main Hub-native join points. + ### hf_profile_summary - category: `profile_summary` @@ -402,6 +535,22 @@ All helpers return the same envelope: `{ok, item, items, meta, error}`. - include: `likes`, `activity` - notes: Profile summary helper. Aggregate counts like followers_count/following_count are in the base item. include=['likes', 'activity'] adds composed samples and extra upstream work; no other include values are supported. Overview-owned repo counts may differ slightly from visible public search/list results. +### hf_read_paper + +- category: `paper_markdown` +- backed_by: `HfApi.read_paper` +- returns: + - envelope: `{ok, item, items, meta, error}` + - row_type: `paper_content` + - default_fields: `paper_id`, `content` + - guaranteed_fields: `paper_id`, `content` + - optional_fields: [] +- supported_params: `paper_id` +- fields_contract: + - allowed_fields: `paper_id`, `content` + - canonical_only: `true` +- notes: Returns paper markdown content backed by HfApi.read_paper. + ### hf_recent_activity - category: `activity_feed` diff --git a/hf-hub-query.md b/hf-hub-query.md index e3c9c2a5f1c2b19a1969eb38d93dad8440c6e34c..acad571a237446587e3ee47ada66a6e48f940955 100644 --- a/hf-hub-query.md +++ b/hf-hub-query.md @@ -1,19 +1,22 @@ --- type: agent name: hf_hub_query -model: gpt-oss +model: hf.openai/gpt-oss-120b:sambanova use_history: false default: true description: "Read-only Hugging Face Hub navigator for discovery, lookup, filtering, ranking, counts, field-constrained extraction, and relationship questions across users, orgs, models, datasets, spaces, collections, discussions, daily papers, recent activity, followers/following, likes, and likers. Good for structured raw outputs and compact results. Generated helper calls can explicitly bound limit, scan_limit, max_pages, and ranking_window for brevity or broader coverage, and the tool can also be asked about its supported helpers, canonical fields, defaults, and coverage behavior." shell: false skills: [] function_tools: - - tool_entrypoints.py:hf_hub_query_raw + - entrypoint: tool_entrypoints.py:hf_hub_query_raw + variant: code + code_arg: code + language: python request_params: tool_result_mode: passthrough --- -reasoning: high +reasoning: medium You are a **tool-using, read-only** Hugging Face Hub search/navigation agent. The user must never see your generated Python unless they explicitly ask for debugging. @@ -23,18 +26,23 @@ The user must never see your generated Python unless they explicitly ask for deb - Put the generated Python only in the tool's `code` argument. - Do **not** output planning text, pseudocode, code fences, or contract explanations before the tool call. - Only ask a brief clarification question if the request is genuinely ambiguous or missing required identity. -- The generated program must define `async def solve(query, max_calls): ...` and end with `await solve(query, max_calls)`. -- Use the original user request, or a tight restatement, as the tool `query`. +- The generated program is a top-level Monty Python script. +- `max_calls` is provided by the runtime as a top-level input. +- Always assign the final output to `result`. +- The final line must be exactly `result`. +- Never stop after `result = ...`; always add a final bare `result` line. +- Do **not** define or call `solve(...)`. +- The tool call only needs `code` unless you truly need optional raw-query metadata. - Do **not** pass explicit `max_calls` or `timeout_sec` tool arguments unless the user explicitly asked for a non-default budget/timeout. Let the runtime defaults apply for ordinary requests. - One user request = one `hf_hub_query_raw` call. Do **not** retry in the same turn. ## Raw return rules -- The return value of `solve(...)` is the user-facing payload. -- Return a dict/list when JSON is appropriate; return a string/number/bool only when that scalar is the intended payload. +- The value of `result` is the user-facing payload. +- Make `result` a dict/list when JSON is appropriate; use a string/number/bool only when that scalar is the intended payload. - For composed structured outputs that include your own coverage metadata, always use the exact top-level keys `results` and `coverage` unless the user explicitly asked for different key names. -- Prefer returning outputs directly unless post-processing is required. Do **NOT** rename fields unless asked specifically. -- Runtime will wrap the `solve(...)` return value under `result` and attach runtime information under `meta`. +- Prefer emitting outputs directly unless post-processing is required. Do **NOT** rename fields unless asked specifically. +- Runtime will wrap the value of `result` under `result` and attach runtime information under `meta`. - When helper-owned coverage metadata matters, prefer returning the helper envelope directly. -- Do **not** create your own transport wrapper such as `{result: ..., meta: ...}` inside `solve(...)`. +- Do **not** create your own transport wrapper such as `{result: ..., meta: ...}` in generated code. {{file:_monty_codegen_shared.md}} diff --git a/monty_api/__pycache__/__init__.cpython-313.pyc b/monty_api/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index 39512f92e121e50a5116f528216f948d19961ae4..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/__init__.cpython-314.pyc b/monty_api/__pycache__/__init__.cpython-314.pyc deleted file mode 100644 index daaa2306bc5508b06b7d194b3e997ec07777f590..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/__init__.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/aliases.cpython-313.pyc b/monty_api/__pycache__/aliases.cpython-313.pyc deleted file mode 100644 index bd1ae20d92321cec882a36fcc5ffd1c44862d4ea..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/aliases.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/aliases.cpython-314.pyc b/monty_api/__pycache__/aliases.cpython-314.pyc deleted file mode 100644 index c766f7ff05dc71174e958b5d3ad080bc0ee5d343..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/aliases.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/constants.cpython-313.pyc b/monty_api/__pycache__/constants.cpython-313.pyc deleted file mode 100644 index 383f1e18932562011f23dd24d972dde0316344e3..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/constants.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/constants.cpython-314.pyc b/monty_api/__pycache__/constants.cpython-314.pyc deleted file mode 100644 index 9d2902498daf2525b66fb9b9592f8bcf12ff05dd..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/constants.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/context_types.cpython-313.pyc b/monty_api/__pycache__/context_types.cpython-313.pyc deleted file mode 100644 index b917a0f9978bc536d1ab15ea2f80bf67981e7bfa..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/context_types.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/context_types.cpython-314.pyc b/monty_api/__pycache__/context_types.cpython-314.pyc deleted file mode 100644 index 11d63830dfdf87d8dbf734a3e2a507d62a4bcbc6..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/context_types.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/helper_contracts.cpython-313.pyc b/monty_api/__pycache__/helper_contracts.cpython-313.pyc deleted file mode 100644 index 4cae3cc5b617490af0aba6097e42b3bd9b24fbf0..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/helper_contracts.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/helper_contracts.cpython-314.pyc b/monty_api/__pycache__/helper_contracts.cpython-314.pyc deleted file mode 100644 index 7aae7bc6b09ee1d0c664803ba8ef25fec3bd72b7..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/helper_contracts.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/http_runtime.cpython-313.pyc b/monty_api/__pycache__/http_runtime.cpython-313.pyc deleted file mode 100644 index b65540fdb94b76006e72ca52d2a8ec133204d4d1..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/http_runtime.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/http_runtime.cpython-314.pyc b/monty_api/__pycache__/http_runtime.cpython-314.pyc deleted file mode 100644 index 292a0bef6385291323b9eb20b53011038ccce52d..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/http_runtime.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/query_entrypoints.cpython-313.pyc b/monty_api/__pycache__/query_entrypoints.cpython-313.pyc deleted file mode 100644 index 8b06ad70424f2c590d0209b5e333c96a03c89388..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/query_entrypoints.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/query_entrypoints.cpython-314.pyc b/monty_api/__pycache__/query_entrypoints.cpython-314.pyc deleted file mode 100644 index c7eab397585f4d565761864e62ce0786ba9381bb..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/query_entrypoints.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/registry.cpython-313.pyc b/monty_api/__pycache__/registry.cpython-313.pyc deleted file mode 100644 index 3693e37e82bc71452ad98969a7c65d72b99964a6..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/registry.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/registry.cpython-314.pyc b/monty_api/__pycache__/registry.cpython-314.pyc deleted file mode 100644 index 3a1b569b41b2090cae58b188ddbb6e82f13ad76a..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/registry.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_context.cpython-313.pyc b/monty_api/__pycache__/runtime_context.cpython-313.pyc deleted file mode 100644 index c01bdad63188b64885b20b656f93bd44894f2a84..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_context.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_context.cpython-314.pyc b/monty_api/__pycache__/runtime_context.cpython-314.pyc deleted file mode 100644 index 4c1eae5b200bfae0bda3539a1fcc767753aa206d..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_context.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_envelopes.cpython-313.pyc b/monty_api/__pycache__/runtime_envelopes.cpython-313.pyc deleted file mode 100644 index d703184b54e5a51307766c5e08d78a8fbff70646..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_envelopes.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_envelopes.cpython-314.pyc b/monty_api/__pycache__/runtime_envelopes.cpython-314.pyc deleted file mode 100644 index 5cd0651f43e24f28984abb8eaba717c3cc7ac5de..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_envelopes.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_filtering.cpython-313.pyc b/monty_api/__pycache__/runtime_filtering.cpython-313.pyc deleted file mode 100644 index dcc897f2cc89ab51578eda14cd55744b93c08783..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_filtering.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/runtime_filtering.cpython-314.pyc b/monty_api/__pycache__/runtime_filtering.cpython-314.pyc deleted file mode 100644 index c547ecda0d5a4038251243f1bcc851a8eb2cb7c4..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/runtime_filtering.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/tool_entrypoints.cpython-313.pyc b/monty_api/__pycache__/tool_entrypoints.cpython-313.pyc deleted file mode 100644 index cfcafb7d5c6a6c76b549f111a1fe330eaa0c053e..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/tool_entrypoints.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/tool_entrypoints.cpython-314.pyc b/monty_api/__pycache__/tool_entrypoints.cpython-314.pyc deleted file mode 100644 index bb11b933703b97635a708641c31ceec3109c1f3f..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/tool_entrypoints.cpython-314.pyc and /dev/null differ diff --git a/monty_api/__pycache__/validation.cpython-313.pyc b/monty_api/__pycache__/validation.cpython-313.pyc deleted file mode 100644 index 99597d781307156aad9f4d0ed16b92f6ce2bbf0c..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/validation.cpython-313.pyc and /dev/null differ diff --git a/monty_api/__pycache__/validation.cpython-314.pyc b/monty_api/__pycache__/validation.cpython-314.pyc deleted file mode 100644 index 36340623aafeee15a0e9b0a212d508ba303cc962..0000000000000000000000000000000000000000 Binary files a/monty_api/__pycache__/validation.cpython-314.pyc and /dev/null differ diff --git a/monty_api/constants.py b/monty_api/constants.py index 91c152907cc1c1c5e2a1f65d23bc8da3740b2f5d..6d6fade0f949365085034c475ac1bfad69aa29b6 100644 --- a/monty_api/constants.py +++ b/monty_api/constants.py @@ -183,22 +183,24 @@ COLLECTION_CANONICAL_FIELDS: tuple[str, ...] = ( "item_count", ) -DAILY_PAPER_CANONICAL_FIELDS: tuple[str, ...] = ( +PAPER_CANONICAL_FIELDS: tuple[str, ...] = ( "paper_id", "title", "summary", "published_at", - "submitted_on_daily_at", + "submitted_at", "authors", + "author_usernames", "organization", "submitted_by", "discussion_id", "upvotes", - "github_repo_url", + "source", + "comments", + "project_page", + "github_repo", "github_stars", - "project_page_url", - "num_comments", - "is_author_participating", - "repo_id", "rank", ) + +PAPER_CONTENT_FIELDS: tuple[str, ...] = ("paper_id", "content") diff --git a/monty_api/helper_contracts.py b/monty_api/helper_contracts.py index 3c3ac77a9713d1a50ec55b084554f87df905a79e..a88c910c9ae73afb5f77b5e06478ecd84c43e884 100644 --- a/monty_api/helper_contracts.py +++ b/monty_api/helper_contracts.py @@ -16,9 +16,10 @@ from .constants import ( ACTIVITY_CANONICAL_FIELDS, ACTOR_CANONICAL_FIELDS, COLLECTION_CANONICAL_FIELDS, - DAILY_PAPER_CANONICAL_FIELDS, DISCUSSION_CANONICAL_FIELDS, DISCUSSION_DETAIL_CANONICAL_FIELDS, + PAPER_CANONICAL_FIELDS, + PAPER_CONTENT_FIELDS, PROFILE_CANONICAL_FIELDS, REPO_CANONICAL_FIELDS, USER_CANONICAL_FIELDS, @@ -76,9 +77,10 @@ FIELD_GROUPS: dict[str, list[str]] = { "activity": list(ACTIVITY_CANONICAL_FIELDS), "actor": list(ACTOR_CANONICAL_FIELDS), "collection": list(COLLECTION_CANONICAL_FIELDS), - "daily_paper": list(DAILY_PAPER_CANONICAL_FIELDS), "discussion": list(DISCUSSION_CANONICAL_FIELDS), "discussion_detail": list(DISCUSSION_DETAIL_CANONICAL_FIELDS), + "paper": list(PAPER_CANONICAL_FIELDS), + "paper_content": list(PAPER_CONTENT_FIELDS), "profile": list(PROFILE_CANONICAL_FIELDS), "repo": list(REPO_CANONICAL_FIELDS), "trending_repo": list(TRENDING_CANONICAL_FIELDS), @@ -109,10 +111,12 @@ HELPER_CONTRACT_SPECS: dict[str, dict[str, Any]] = { }, "hf_daily_papers": { "category": "curated_feed", - "row_type": "daily_paper", - "fields_group": "daily_paper", + "row_type": "paper", + "fields_group": "paper", "filter_param": "where", - "filter_group": "daily_paper", + "filter_group": "paper", + "param_values": {"sort": ["published_at", "trending"]}, + "backed_by": "HfApi.list_daily_papers", }, "hf_datasets_search": { "category": "wrapped_hf_repo_search", @@ -142,6 +146,20 @@ HELPER_CONTRACT_SPECS: dict[str, dict[str, Any]] = { "row_type": "profile", "param_values": {"include": ["likes", "activity"]}, }, + "hf_paper_info": { + "category": "paper_detail", + "row_type": "paper", + "fields_group": "paper", + "backed_by": "HfApi.paper_info", + }, + "hf_papers_search": { + "category": "paper_search", + "row_type": "paper", + "fields_group": "paper", + "filter_param": "where", + "filter_group": "paper", + "backed_by": "HfApi.list_papers", + }, "hf_recent_activity": { "category": "activity_feed", "row_type": "activity", @@ -189,6 +207,12 @@ HELPER_CONTRACT_SPECS: dict[str, dict[str, Any]] = { "row_type": "runtime_capability", "param_values": {"section": list(RUNTIME_CAPABILITY_SECTION_VALUES)}, }, + "hf_read_paper": { + "category": "paper_markdown", + "row_type": "paper_content", + "fields_group": "paper_content", + "backed_by": "HfApi.read_paper", + }, "hf_spaces_search": { "category": "wrapped_hf_repo_search", "row_type": "repo", @@ -396,6 +420,9 @@ def build_helper_contracts( param_values = _param_values_for_helper(helper_name) if param_values is not None: contract["param_values"] = param_values + backed_by = spec.get("backed_by") + if isinstance(backed_by, str): + contract["backed_by"] = backed_by upstream_repo_type = spec.get("upstream_repo_type") if isinstance(upstream_repo_type, str): diff --git a/monty_api/helpers/__init__.py b/monty_api/helpers/__init__.py index 88eb47e0d95c71f80c55276d40d7ce413b8e07d3..2bc18dc35db4c712b101c5c95d2983ebb569748f 100644 --- a/monty_api/helpers/__init__.py +++ b/monty_api/helpers/__init__.py @@ -1,6 +1,7 @@ from .activity import register_activity_helpers from .collections import register_collection_helpers from .introspection import register_introspection_helpers +from .papers import register_paper_helpers from .profiles import register_profile_helpers from .repos import register_repo_helpers @@ -8,6 +9,7 @@ __all__ = [ "register_activity_helpers", "register_collection_helpers", "register_introspection_helpers", + "register_paper_helpers", "register_profile_helpers", "register_repo_helpers", ] diff --git a/monty_api/helpers/__pycache__/__init__.cpython-313.pyc b/monty_api/helpers/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index 9b7fb4c6512c8de74217a37aeb7dfbd2d9201b33..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/__init__.cpython-314.pyc b/monty_api/helpers/__pycache__/__init__.cpython-314.pyc deleted file mode 100644 index 04fa78e132ccddfeb5a71d7715b4d17f67ebe234..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/__init__.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/activity.cpython-313.pyc b/monty_api/helpers/__pycache__/activity.cpython-313.pyc deleted file mode 100644 index a99ea3fa5d25bc4726660f3b265881ddfd6106ec..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/activity.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/activity.cpython-314.pyc b/monty_api/helpers/__pycache__/activity.cpython-314.pyc deleted file mode 100644 index 06eda356bc1abf6a5a8281a285368450925a1dbc..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/activity.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/collections.cpython-313.pyc b/monty_api/helpers/__pycache__/collections.cpython-313.pyc deleted file mode 100644 index 6c1b08096a5c7ca789fa23b74068cd83a4d8d57e..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/collections.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/collections.cpython-314.pyc b/monty_api/helpers/__pycache__/collections.cpython-314.pyc deleted file mode 100644 index 8084cff8a506017a9c135a11ef663d0c5ae44cde..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/collections.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/common.cpython-313.pyc b/monty_api/helpers/__pycache__/common.cpython-313.pyc deleted file mode 100644 index fe7a43e5b99e30fba0b37cecea03fdae16db6eea..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/common.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/common.cpython-314.pyc b/monty_api/helpers/__pycache__/common.cpython-314.pyc deleted file mode 100644 index 75718b2a613b0cfc0cf0b59b033c08f45f5a4b10..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/common.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/introspection.cpython-313.pyc b/monty_api/helpers/__pycache__/introspection.cpython-313.pyc deleted file mode 100644 index b7fac1a438d65f61f68d7a2fcd345280b2583545..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/introspection.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/introspection.cpython-314.pyc b/monty_api/helpers/__pycache__/introspection.cpython-314.pyc deleted file mode 100644 index 1d29727f9ffd4cf6a3c00d85c48a3217dc81ead0..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/introspection.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/profiles.cpython-313.pyc b/monty_api/helpers/__pycache__/profiles.cpython-313.pyc deleted file mode 100644 index 39c6c94864393c18cb06a97b7dad8e92e62e922e..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/profiles.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/profiles.cpython-314.pyc b/monty_api/helpers/__pycache__/profiles.cpython-314.pyc deleted file mode 100644 index 2e464de48ef0802df560a02a8c0d80877d7a97e1..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/profiles.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/repos.cpython-313.pyc b/monty_api/helpers/__pycache__/repos.cpython-313.pyc deleted file mode 100644 index f86ed34bf4bd222f802707272094f22c3fdf15c3..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/repos.cpython-313.pyc and /dev/null differ diff --git a/monty_api/helpers/__pycache__/repos.cpython-314.pyc b/monty_api/helpers/__pycache__/repos.cpython-314.pyc deleted file mode 100644 index 70982d48a0b75f6f8bb3a4e77090cda51817e8b0..0000000000000000000000000000000000000000 Binary files a/monty_api/helpers/__pycache__/repos.cpython-314.pyc and /dev/null differ diff --git a/monty_api/helpers/introspection.py b/monty_api/helpers/introspection.py index c3c51e1c263bab059e05ac6f69937bbdcb8b99aa..1559825cd0d1a7199701844b0af3865c320dc9a6 100644 --- a/monty_api/helpers/introspection.py +++ b/monty_api/helpers/introspection.py @@ -10,7 +10,6 @@ from ..constants import ( ACTIVITY_CANONICAL_FIELDS, ACTOR_CANONICAL_FIELDS, COLLECTION_CANONICAL_FIELDS, - DAILY_PAPER_CANONICAL_FIELDS, DISCUSSION_CANONICAL_FIELDS, DISCUSSION_DETAIL_CANONICAL_FIELDS, DEFAULT_MAX_CALLS, @@ -19,6 +18,8 @@ from ..constants import ( LIKES_SCAN_LIMIT_CAP, MAX_CALLS_LIMIT, OUTPUT_ITEMS_TRUNCATION_LIMIT, + PAPER_CANONICAL_FIELDS, + PAPER_CONTENT_FIELDS, PROFILE_CANONICAL_FIELDS, RECENT_ACTIVITY_SCAN_MAX_PAGES, REPO_CANONICAL_FIELDS, @@ -140,7 +141,8 @@ async def hf_runtime_capabilities( "user_likes": list(USER_LIKES_CANONICAL_FIELDS), "activity": list(ACTIVITY_CANONICAL_FIELDS), "collection": list(COLLECTION_CANONICAL_FIELDS), - "daily_paper": list(DAILY_PAPER_CANONICAL_FIELDS), + "paper": list(PAPER_CANONICAL_FIELDS), + "paper_content": list(PAPER_CONTENT_FIELDS), "discussion": list(DISCUSSION_CANONICAL_FIELDS), "discussion_detail": list(DISCUSSION_DETAIL_CANONICAL_FIELDS), }, diff --git a/monty_api/helpers/papers.py b/monty_api/helpers/papers.py new file mode 100644 index 0000000000000000000000000000000000000000..536db0ada09072601ac42e86de2254381e9a5428 --- /dev/null +++ b/monty_api/helpers/papers.py @@ -0,0 +1,318 @@ +from __future__ import annotations + +from functools import partial +from typing import Any, Callable + +from ..constants import OUTPUT_ITEMS_TRUNCATION_LIMIT, PAPER_CANONICAL_FIELDS +from ..context_types import HelperRuntimeContext + + +def _extract_author_usernames(authors: list[Any] | None) -> list[str] | None: + if not isinstance(authors, list): + return None + usernames: list[str] = [] + for author in authors: + user = getattr(author, "user", None) + for candidate in ( + getattr(user, "username", None), + getattr(user, "user", None), + getattr(user, "name", None), + ): + if isinstance(candidate, str): + cleaned = candidate.strip() + if cleaned and cleaned not in usernames: + usernames.append(cleaned) + break + return usernames or None + + +def _normalize_paper_sort(sort: str | None) -> tuple[str | None, str | None]: + cleaned = str(sort or "").strip() + if not cleaned: + return (None, None) + alias_map = { + "published_at": "publishedAt", + "publishedAt": "publishedAt", + "trending": "trending", + } + resolved = alias_map.get(cleaned) + if resolved is None: + return (None, "sort must be one of published_at, publishedAt, trending") + return (resolved, None) + + +def _normalize_paper_info( + ctx: HelperRuntimeContext, + paper: Any, + *, + rank: int | None = None, +) -> dict[str, Any]: + authors = getattr(paper, "authors", None) + organization = getattr(paper, "organization", None) + submitted_by = getattr(paper, "submitted_by", None) + row = { + "paper_id": getattr(paper, "id", None), + "title": getattr(paper, "title", None), + "summary": getattr(paper, "summary", None), + "published_at": ctx._dt_to_str(getattr(paper, "published_at", None)), + "submitted_at": ctx._dt_to_str(getattr(paper, "submitted_at", None)), + "authors": ctx._extract_author_names(authors), + "author_usernames": _extract_author_usernames(authors), + "organization": ctx._extract_profile_name(organization), + "submitted_by": ctx._extract_profile_name(submitted_by), + "discussion_id": getattr(paper, "discussion_id", None), + "upvotes": ctx._as_int(getattr(paper, "upvotes", None)), + "source": getattr(paper, "source", None), + "comments": ctx._as_int(getattr(paper, "comments", None)), + "project_page": getattr(paper, "project_page", None), + "github_repo": getattr(paper, "github_repo", None), + "github_stars": ctx._as_int(getattr(paper, "github_stars", None)), + "ai_summary": getattr(paper, "ai_summary", None), + "ai_keywords": getattr(paper, "ai_keywords", None), + "rank": rank, + } + return row + + +async def _run_paper_list_helper( + ctx: HelperRuntimeContext, + *, + helper_name: str, + source: str, + loader: Callable[[Any, int], list[Any]], + limit: int, + where: dict[str, Any] | None, + fields: list[str] | None, + ordered_ranking: bool = False, + **meta: Any, +) -> dict[str, Any]: + start_calls = ctx.call_count["n"] + default_limit = ctx._policy_int(helper_name, "default_limit", 20) + max_limit = ctx._policy_int( + helper_name, "max_limit", OUTPUT_ITEMS_TRUNCATION_LIMIT + ) + requested_limit = limit + applied_limit = ctx._clamp_int( + limit, + default=default_limit, + minimum=1, + maximum=max_limit, + ) + limit_meta = ctx._derive_limit_metadata( + requested_limit=requested_limit, + applied_limit=applied_limit, + default_limit_used=limit == default_limit, + ) + api = ctx._get_hf_api_client() + try: + payload = ctx._host_hf_call( + source, + lambda: loader(api, applied_limit), + ) + except Exception as exc: + return ctx._helper_error(start_calls=start_calls, source=source, error=exc) + + items = [ + _normalize_paper_info(ctx, paper, rank=index if ordered_ranking else None) + for index, paper in enumerate(payload[:applied_limit], start=1) + ] + try: + items = ctx._apply_where(items, where, allowed_fields=PAPER_CANONICAL_FIELDS) + except ValueError as exc: + return ctx._helper_error(start_calls=start_calls, source=source, error=exc) + matched = len(items) + try: + items = ctx._project_items( + items[:applied_limit], + fields, + allowed_fields=PAPER_CANONICAL_FIELDS, + ) + except ValueError as exc: + return ctx._helper_error(start_calls=start_calls, source=source, error=exc) + + limit_boundary_hit = len(payload) >= applied_limit + next_request_hint = None + if limit_boundary_hit: + next_request_hint = ( + f"Increase limit above {applied_limit} to check whether more rows exist" + ) + + return ctx._helper_success( + start_calls=start_calls, + source=source, + items=items, + limit=applied_limit, + scanned=len(payload), + matched=matched, + returned=len(items), + ordered_ranking=ordered_ranking, + more_available="unknown" if limit_boundary_hit else False, + limit_boundary_hit=limit_boundary_hit, + next_request_hint=next_request_hint, + **limit_meta, + **meta, + ) + + +async def hf_daily_papers( + ctx: HelperRuntimeContext, + date: str | None = None, + week: str | None = None, + month: str | None = None, + submitter: str | None = None, + sort: str | None = None, + p: int | None = None, + limit: int = 20, + where: dict[str, Any] | None = None, + fields: list[str] | None = None, +) -> dict[str, Any]: + normalized_sort, sort_error = _normalize_paper_sort(sort) + if sort_error: + return ctx._helper_error( + start_calls=ctx.call_count["n"], + source="/api/daily_papers", + error=sort_error, + ) + return await _run_paper_list_helper( + ctx, + helper_name="hf_daily_papers", + source="/api/daily_papers", + loader=lambda api, applied_limit: list( + api.list_daily_papers( + date=date, + week=week, + month=month, + submitter=submitter, + sort=normalized_sort, + p=p, + limit=applied_limit, + ) + ), + limit=limit, + where=where, + fields=fields, + ordered_ranking=True, + date=date, + week=week, + month=month, + submitter=submitter, + sort=normalized_sort, + p=p, + ) + + +async def hf_papers_search( + ctx: HelperRuntimeContext, + query: str, + limit: int = 20, + where: dict[str, Any] | None = None, + fields: list[str] | None = None, +) -> dict[str, Any]: + term = str(query or "").strip() + if not term: + return ctx._helper_error( + start_calls=ctx.call_count["n"], + source="/api/papers/search", + error="query is required", + ) + return await _run_paper_list_helper( + ctx, + helper_name="hf_papers_search", + source="/api/papers/search", + loader=lambda api, applied_limit: list( + api.list_papers(query=term, limit=applied_limit) + ), + limit=limit, + where=where, + fields=fields, + query=term, + ) + + +async def hf_paper_info( + ctx: HelperRuntimeContext, + paper_id: str, + fields: list[str] | None = None, +) -> dict[str, Any]: + start_calls = ctx.call_count["n"] + pid = str(paper_id or "").strip() + if not pid: + return ctx._helper_error( + start_calls=start_calls, + source="/api/papers/", + error="paper_id is required", + ) + try: + paper = ctx._host_hf_call( + f"/api/papers/{pid}", + lambda: ctx._get_hf_api_client().paper_info(id=pid), + ) + except Exception as exc: + return ctx._helper_error( + start_calls=start_calls, + source=f"/api/papers/{pid}", + error=exc, + paper_id=pid, + ) + item = _normalize_paper_info(ctx, paper) + items = [item] + try: + items = ctx._project_items(items, fields, allowed_fields=PAPER_CANONICAL_FIELDS) + except ValueError as exc: + return ctx._helper_error( + start_calls=start_calls, + source=f"/api/papers/{pid}", + error=exc, + paper_id=pid, + ) + return ctx._helper_success( + start_calls=start_calls, + source=f"/api/papers/{pid}", + items=items, + paper_id=pid, + returned=len(items), + matched=len(items), + ) + + +async def hf_read_paper( + ctx: HelperRuntimeContext, + paper_id: str, +) -> dict[str, Any]: + start_calls = ctx.call_count["n"] + pid = str(paper_id or "").strip() + if not pid: + return ctx._helper_error( + start_calls=start_calls, + source="/papers/.md", + error="paper_id is required", + ) + try: + content = ctx._host_hf_call( + f"/papers/{pid}.md", + lambda: ctx._get_hf_api_client().read_paper(id=pid), + ) + except Exception as exc: + return ctx._helper_error( + start_calls=start_calls, + source=f"/papers/{pid}.md", + error=exc, + paper_id=pid, + ) + return ctx._helper_success( + start_calls=start_calls, + source=f"/papers/{pid}.md", + items=[{"paper_id": pid, "content": content}], + paper_id=pid, + returned=1, + matched=1, + ) + + +def register_paper_helpers(ctx: HelperRuntimeContext) -> dict[str, Callable[..., Any]]: + return { + "hf_daily_papers": partial(hf_daily_papers, ctx), + "hf_papers_search": partial(hf_papers_search, ctx), + "hf_paper_info": partial(hf_paper_info, ctx), + "hf_read_paper": partial(hf_read_paper, ctx), + } diff --git a/monty_api/helpers/profiles.py b/monty_api/helpers/profiles.py index 509cfb3cf10398612630dd2824e01269e5344b5b..c3b06be84b3adde59836a237e5b43e0d1e9dd0b0 100644 --- a/monty_api/helpers/profiles.py +++ b/monty_api/helpers/profiles.py @@ -338,8 +338,8 @@ async def hf_org_members( ) sample_complete = ( exact_count - and len(normalized) <= applied_limit - and (not count_only or len(normalized) == 0) + and total_matched <= applied_limit + and (not count_only or total_matched == 0) ) more_available = ctx._derive_more_available( sample_complete=sample_complete, @@ -372,13 +372,18 @@ async def hf_org_members( "organization": org, }, limit_plan=limit_plan, - matched_count=len(normalized), + matched_count=total_matched, returned_count=len(items), exact_count=exact_count, count_only=count_only, sample_complete=sample_complete, more_available=more_available, - scan_limit_hit=scan_limit_hit, + scan_limit_hit=scan_limit_hit + or ( + overview_total is not None + and overview_total > observed_total + and observed_total >= scan_lim + ), ) return ctx._helper_success( start_calls=start_calls, source=endpoint, items=items, meta=meta @@ -573,8 +578,8 @@ async def _user_graph_helper( ) sample_complete = ( exact_count - and len(normalized) <= applied_limit - and (not count_only or len(normalized) == 0) + and total_matched <= applied_limit + and (not count_only or total_matched == 0) ) more_available = ctx._derive_more_available( sample_complete=sample_complete, @@ -617,13 +622,18 @@ async def _user_graph_helper( "organization": u if entity_type == "organization" else None, }, limit_plan=limit_plan, - matched_count=len(normalized), + matched_count=total_matched, returned_count=len(items), exact_count=exact_count, count_only=count_only, sample_complete=sample_complete, more_available=more_available, - scan_limit_hit=scan_limit_hit, + scan_limit_hit=scan_limit_hit + or ( + overview_total is not None + and overview_total > observed_total + and observed_total >= scan_lim + ), ) return ctx._helper_success( start_calls=start_calls, source=endpoint, items=items, meta=meta diff --git a/monty_api/helpers/repos.py b/monty_api/helpers/repos.py index 2200de664d9d326bd309bfed137e8ce8264ef237..8faf96a94af82b6edf0ab58b875645d5e969828b 100644 --- a/monty_api/helpers/repos.py +++ b/monty_api/helpers/repos.py @@ -7,7 +7,6 @@ from ..context_types import HelperRuntimeContext from ..helper_contracts import repo_expand_alias_map from ..constants import ( ACTOR_CANONICAL_FIELDS, - DAILY_PAPER_CANONICAL_FIELDS, EXHAUSTIVE_HELPER_RETURN_HARD_CAP, LIKES_ENRICHMENT_MAX_REPOS, LIKES_RANKING_WINDOW_DEFAULT, @@ -1287,62 +1286,6 @@ async def hf_trending( ) -async def hf_daily_papers( - ctx: HelperRuntimeContext, - limit: int = 20, - where: dict[str, Any] | None = None, - fields: list[str] | None = None, -) -> dict[str, Any]: - start_calls = ctx.call_count["n"] - default_limit = ctx._policy_int("hf_daily_papers", "default_limit", 20) - max_limit = ctx._policy_int( - "hf_daily_papers", "max_limit", OUTPUT_ITEMS_TRUNCATION_LIMIT - ) - lim = ctx._clamp_int(limit, default=default_limit, minimum=1, maximum=max_limit) - resp = ctx._host_raw_call("/api/daily_papers", params={"limit": lim}) - if not resp.get("ok"): - return ctx._helper_error( - start_calls=start_calls, - source="/api/daily_papers", - error=resp.get("error") or "daily papers fetch failed", - ) - payload = resp.get("data") if isinstance(resp.get("data"), list) else [] - items: list[dict[str, Any]] = [] - for idx, row in enumerate(payload[:lim], start=1): - if not isinstance(row, dict): - continue - items.append(ctx._normalize_daily_paper_row(row, rank=idx)) - try: - items = ctx._apply_where( - items, where, allowed_fields=DAILY_PAPER_CANONICAL_FIELDS - ) - except ValueError as exc: - return ctx._helper_error( - start_calls=start_calls, - source="/api/daily_papers", - error=exc, - ) - matched = len(items) - try: - items = ctx._project_daily_paper_items(items[:lim], fields) - except ValueError as exc: - return ctx._helper_error( - start_calls=start_calls, - source="/api/daily_papers", - error=exc, - ) - return ctx._helper_success( - start_calls=start_calls, - source="/api/daily_papers", - items=items, - limit=lim, - scanned=len(payload), - matched=matched, - returned=len(items), - ordered_ranking=True, - ) - - def register_repo_helpers(ctx: HelperRuntimeContext) -> dict[str, Callable[..., Any]]: return { "hf_models_search": partial(hf_models_search, ctx), @@ -1355,5 +1298,4 @@ def register_repo_helpers(ctx: HelperRuntimeContext) -> dict[str, Callable[..., "hf_repo_discussion_details": partial(hf_repo_discussion_details, ctx), "hf_repo_details": partial(hf_repo_details, ctx), "hf_trending": partial(hf_trending, ctx), - "hf_daily_papers": partial(hf_daily_papers, ctx), } diff --git a/monty_api/http_runtime.py b/monty_api/http_runtime.py index 5797ad34599857661bc621999249d1af8af8487c..451525bbd159d1176f3e8f133489863f6a63b587 100644 --- a/monty_api/http_runtime.py +++ b/monty_api/http_runtime.py @@ -429,47 +429,6 @@ def _normalize_trending_row( return row -def _normalize_daily_paper_row( - row: dict[str, Any], rank: int | None = None -) -> dict[str, Any]: - paper = row.get("paper") if isinstance(row.get("paper"), dict) else {} - org = ( - row.get("organization") - if isinstance(row.get("organization"), dict) - else paper.get("organization") - ) - organization = None - if isinstance(org, dict): - organization = org.get("name") or org.get("fullname") - - item = { - "paper_id": paper.get("id"), - "title": row.get("title") or paper.get("title"), - "summary": row.get("summary") - or paper.get("summary") - or paper.get("ai_summary"), - "published_at": row.get("publishedAt") or paper.get("publishedAt"), - "submitted_on_daily_at": paper.get("submittedOnDailyAt"), - "authors": _extract_author_names(paper.get("authors")), - "organization": organization, - "submitted_by": _extract_profile_name( - row.get("submittedBy") or paper.get("submittedOnDailyBy") - ), - "discussion_id": paper.get("discussionId"), - "upvotes": _as_int(paper.get("upvotes")), - "github_repo_url": paper.get("githubRepo"), - "github_stars": _as_int(paper.get("githubStars")), - "project_page_url": paper.get("projectPage"), - "num_comments": _as_int(row.get("numComments")), - "is_author_participating": row.get("isAuthorParticipating") - if isinstance(row.get("isAuthorParticipating"), bool) - else None, - "repo_id": row.get("repo_id") or paper.get("repo_id"), - "rank": rank, - } - return item - - def _normalize_collection_repo_item(row: dict[str, Any]) -> dict[str, Any] | None: repo_id = row.get("id") or row.get("repoId") or row.get("repo_id") if not isinstance(repo_id, str) or not repo_id: diff --git a/monty_api/query_entrypoints.py b/monty_api/query_entrypoints.py index 0f22de1ba98aac9db63b00c1d21819c399eed24e..0532acef0e5ef0242e50b428182c8ff68e030f5d 100644 --- a/monty_api/query_entrypoints.py +++ b/monty_api/query_entrypoints.py @@ -7,6 +7,7 @@ import json import os import sys import time +import warnings from typing import Any, Callable from .constants import ( @@ -21,6 +22,7 @@ from .constants import ( from .runtime_context import build_runtime_helper_environment from .validation import ( _coerce_jsonish_python_literals, + _compact_result_metadata, _summarize_limit_hit, _truncate_result_payload, _validate_generated_code, @@ -35,18 +37,52 @@ class MontyExecutionError(RuntimeError): self.trace = trace +_PYDANTIC_MONTY_ISCOROUTINE_DEPRECATION = ( + r"'inspect\.iscoroutinefunction' is deprecated and slated for removal in Python 3\.16" +) + + +def _install_known_runtime_warning_filters( + *, version_info: tuple[int, ...] | None = None +) -> None: + active_version = version_info or sys.version_info + if tuple(active_version) < (3, 14): + return + warnings.filterwarnings( + "ignore", + message=_PYDANTIC_MONTY_ISCOROUTINE_DEPRECATION, + category=DeprecationWarning, + module=r"pydantic_monty(\..*)?", + ) + + def _query_debug_enabled() -> bool: value = os.environ.get("MONTY_DEBUG_QUERY", "") return value.strip().lower() in {"1", "true", "yes", "on"} +def _execution_debug_enabled() -> bool: + value = os.environ.get("MONTY_DEBUG_EXECUTION", "") + if value.strip().lower() in {"1", "true", "yes", "on"}: + return True + return _query_debug_enabled() + + +def _debug_log(*parts: Any) -> None: + if not _execution_debug_enabled(): + return + print("[monty-debug]", *parts, file=sys.stderr) + sys.stderr.flush() + + def _log_generated_query( - *, query: str, code: str, max_calls: int | None, timeout_sec: int | None + *, query: str | None, code: str, max_calls: int | None, timeout_sec: int | None ) -> None: if not _query_debug_enabled(): return - print("[monty-debug] query:", file=sys.stderr) - print(query, file=sys.stderr) + if query: + print("[monty-debug] query:", file=sys.stderr) + print(query, file=sys.stderr) print("[monty-debug] max_calls:", max_calls, file=sys.stderr) print("[monty-debug] timeout_sec:", timeout_sec, file=sys.stderr) print("[monty-debug] code:", file=sys.stderr) @@ -72,11 +108,18 @@ def _introspect_helper_signatures() -> dict[str, set[str]]: async def _run_with_monty( *, code: str, - query: str, + query: str | None, max_calls: int, strict_mode: bool, timeout_sec: int, ) -> dict[str, Any]: + _debug_log( + "run_monty:start", + f"max_calls={max_calls}", + f"timeout_sec={timeout_sec}", + f"strict_mode={strict_mode}", + ) + _install_known_runtime_warning_filters() try: import pydantic_monty except Exception as e: @@ -101,10 +144,24 @@ async def _run_with_monty( helper_name: str, fn: Callable[..., Any] ) -> Callable[..., Any]: async def wrapped(*args: Any, **kwargs: Any) -> Any: + started = time.perf_counter() + _debug_log( + "helper:start", + helper_name, + f"args={len(args)}", + f"kwargs={sorted(kwargs)}", + ) result = await fn(*args, **kwargs) summary = _summarize_limit_hit(helper_name, result) if summary is not None and len(env.limit_summaries) < 20: env.limit_summaries.append(summary) + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log( + "helper:end", + helper_name, + f"elapsed_ms={elapsed_ms}", + f"api_calls={env.call_count['n']}", + ) return result return wrapped @@ -117,16 +174,18 @@ async def _run_with_monty( } try: - result = await pydantic_monty.run_monty_async( - m, - inputs={"query": query, "max_calls": max_calls}, + _debug_log("run_monty:invoke") + result = await m.run_async( + inputs={"query": query or "", "max_calls": max_calls}, external_functions={ name: _collecting_wrapper(name, fn) for name, fn in env.helper_functions.items() }, limits=limits, ) + _debug_log("run_monty:return", f"api_calls={env.call_count['n']}") except Exception as e: + _debug_log("run_monty:error", type(e).__name__, str(e)) raise MontyExecutionError(str(e), env.call_count["n"], env.trace) from e if env.call_count["n"] == 0: @@ -200,32 +259,32 @@ async def _run_with_monty( def _prepare_query_inputs( *, - query: str, + query: str | None, code: str, max_calls: int | None, timeout_sec: int | None, ) -> tuple[str, str, int, int]: - if not query or not query.strip(): - raise ValueError("query is required") if not code or not code.strip(): raise ValueError("code is required") + normalized_query = str(query or "").strip() resolved_max_calls = DEFAULT_MAX_CALLS if max_calls is None else max_calls resolved_timeout_sec = DEFAULT_TIMEOUT_SEC if timeout_sec is None else timeout_sec normalized_max_calls = max(1, min(int(resolved_max_calls), MAX_CALLS_LIMIT)) normalized_timeout_sec = int(resolved_timeout_sec) normalized_code = _coerce_jsonish_python_literals(code.strip()) _validate_generated_code(normalized_code) - return query, normalized_code, normalized_max_calls, normalized_timeout_sec + return normalized_query, normalized_code, normalized_max_calls, normalized_timeout_sec async def _execute_query( *, - query: str, + query: str | None, code: str, max_calls: int | None, timeout_sec: int | None, ) -> dict[str, Any]: + _debug_log("execute_query:start") prepared_query, prepared_code, prepared_max_calls, prepared_timeout = ( _prepare_query_inputs( query=query, @@ -234,6 +293,13 @@ async def _execute_query( timeout_sec=timeout_sec, ) ) + _debug_log( + "execute_query:prepared", + f"query_len={len(prepared_query)}", + f"code_len={len(prepared_code)}", + f"max_calls={prepared_max_calls}", + f"timeout_sec={prepared_timeout}", + ) _log_generated_query( query=prepared_query, code=prepared_code, @@ -250,8 +316,8 @@ async def _execute_query( async def hf_hub_query( - query: str, code: str, + query: str | None = None, max_calls: int | None = DEFAULT_MAX_CALLS, timeout_sec: int | None = DEFAULT_TIMEOUT_SEC, ) -> dict[str, Any]: @@ -270,7 +336,7 @@ async def hf_hub_query( ) return { "ok": True, - "data": run["output"], + "data": _compact_result_metadata(run["output"]), "error": None, "api_calls": run["api_calls"], } @@ -291,8 +357,8 @@ async def hf_hub_query( async def hf_hub_query_raw( - query: str, code: str, + query: str | None = None, max_calls: int | None = DEFAULT_MAX_CALLS, timeout_sec: int | None = DEFAULT_TIMEOUT_SEC, ) -> Any: @@ -300,7 +366,7 @@ async def hf_hub_query_raw( Best for read-only Hub discovery, lookup, ranking, and relationship questions when the caller wants a runtime-owned raw envelope: - ``result`` contains the direct ``solve(...)`` output and ``meta`` contains + ``result`` contains the generated script's final `result` value and ``meta`` contains execution details such as timing, call counts, and limit summaries. """ started = time.perf_counter() @@ -313,7 +379,7 @@ async def hf_hub_query_raw( ) elapsed_ms = int((time.perf_counter() - started) * 1000) return _wrap_raw_result( - run["output"], + _compact_result_metadata(run["output"]), ok=True, api_calls=run["api_calls"], elapsed_ms=elapsed_ms, @@ -341,7 +407,7 @@ async def hf_hub_query_raw( def _arg_parser() -> argparse.ArgumentParser: p = argparse.ArgumentParser(description="Monty-backed API chaining tool (v3)") - p.add_argument("--query", required=True, help="Natural language query") + p.add_argument("--query", default=None, help="Optional natural language query/context") p.add_argument("--code", default=None, help="Inline Monty code to execute") p.add_argument( "--code-file", default=None, help="Path to .py file with Monty code to execute" @@ -375,8 +441,8 @@ def main() -> int: try: out = asyncio.run( hf_hub_query( - query=args.query, code=code, + query=args.query, max_calls=args.max_calls, timeout_sec=args.timeout, ) diff --git a/monty_api/registry.py b/monty_api/registry.py index 04e303ae076a9e35ffd7dd3e7a12a40b1915fd19..eb5e72b82d8ad69c3474d9dcd94a36f4d6b82577 100644 --- a/monty_api/registry.py +++ b/monty_api/registry.py @@ -7,7 +7,6 @@ from .constants import ( ACTIVITY_CANONICAL_FIELDS, ACTOR_CANONICAL_FIELDS, COLLECTION_CANONICAL_FIELDS, - DAILY_PAPER_CANONICAL_FIELDS, DISCUSSION_CANONICAL_FIELDS, DISCUSSION_DETAIL_CANONICAL_FIELDS, GRAPH_SCAN_LIMIT_CAP, @@ -16,6 +15,8 @@ from .constants import ( LIKES_SCAN_LIMIT_CAP, OUTPUT_ITEMS_TRUNCATION_LIMIT, PROFILE_CANONICAL_FIELDS, + PAPER_CANONICAL_FIELDS, + PAPER_CONTENT_FIELDS, RECENT_ACTIVITY_PAGE_SIZE, RECENT_ACTIVITY_SCAN_MAX_PAGES, REPO_CANONICAL_FIELDS, @@ -230,12 +231,13 @@ TRENDING_OPTIONAL_FIELDS = [ for field in TRENDING_DEFAULT_FIELDS if field not in {"repo_id", "repo_type", "author", "repo_url", "trending_rank"} ] -DAILY_PAPER_DEFAULT_FIELDS = list(DAILY_PAPER_CANONICAL_FIELDS) -DAILY_PAPER_OPTIONAL_FIELDS = [ +PAPER_DEFAULT_FIELDS = list(PAPER_CANONICAL_FIELDS) +PAPER_OPTIONAL_FIELDS = [ field - for field in DAILY_PAPER_CANONICAL_FIELDS - if field not in {"paper_id", "title", "published_at", "rank"} + for field in PAPER_CANONICAL_FIELDS + if field not in {"paper_id", "title", "published_at"} ] +PAPER_CONTENT_DEFAULT_FIELDS = list(PAPER_CONTENT_FIELDS) COLLECTION_DEFAULT_FIELDS = list(COLLECTION_CANONICAL_FIELDS) COLLECTION_OPTIONAL_FIELDS = [ field @@ -563,15 +565,48 @@ HELPER_CONFIGS: dict[str, HelperConfig] = { "hf_daily_papers", endpoint_patterns=(r"^/api/daily_papers$",), default_metadata=_metadata( - default_fields=DAILY_PAPER_DEFAULT_FIELDS, - guaranteed_fields=["paper_id", "title", "published_at", "rank"], - optional_fields=DAILY_PAPER_OPTIONAL_FIELDS, + default_fields=PAPER_DEFAULT_FIELDS, + guaranteed_fields=["paper_id", "title", "published_at"], + optional_fields=PAPER_OPTIONAL_FIELDS, default_limit=20, max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT, - notes="Returns daily paper summary rows. repo_id is omitted unless the upstream payload provides it.", + notes="Curated daily papers feed backed by HfApi.list_daily_papers. Useful join points: organization, submitted_by, author_usernames, discussion_id.", ), pagination={"default_limit": 20, "max_limit": OUTPUT_ITEMS_TRUNCATION_LIMIT}, ), + "hf_papers_search": _config( + "hf_papers_search", + endpoint_patterns=(r"^/api/papers/search$",), + default_metadata=_metadata( + default_fields=PAPER_DEFAULT_FIELDS, + guaranteed_fields=["paper_id", "title", "published_at"], + optional_fields=PAPER_OPTIONAL_FIELDS, + default_limit=20, + max_limit=OUTPUT_ITEMS_TRUNCATION_LIMIT, + notes="Paper search helper backed by HfApi.list_papers. Use organization, submitted_by, and author_usernames as the main Hub-native join points.", + ), + pagination={"default_limit": 20, "max_limit": OUTPUT_ITEMS_TRUNCATION_LIMIT}, + ), + "hf_paper_info": _config( + "hf_paper_info", + endpoint_patterns=(r"^/api/papers/[^/]+$",), + default_metadata=_metadata( + default_fields=PAPER_DEFAULT_FIELDS, + guaranteed_fields=["paper_id", "title", "published_at"], + optional_fields=PAPER_OPTIONAL_FIELDS, + notes="Exact paper metadata helper backed by HfApi.paper_info.", + ), + ), + "hf_read_paper": _config( + "hf_read_paper", + endpoint_patterns=(r"^/papers/[^/]+\.md$",), + default_metadata=_metadata( + default_fields=PAPER_CONTENT_DEFAULT_FIELDS, + guaranteed_fields=["paper_id", "content"], + optional_fields=[], + notes="Returns paper markdown content backed by HfApi.read_paper.", + ), + ), "hf_collections_search": _config( "hf_collections_search", endpoint_patterns=(r"^/api/collections$",), @@ -629,6 +664,9 @@ ALLOWLIST_PATTERNS = [ r"^/api/whoami-v2$", r"^/api/trending$", r"^/api/daily_papers$", + r"^/api/papers/search$", + r"^/api/papers/[^/]+$", + r"^/papers/[^/]+\.md$", r"^/api/models$", r"^/api/datasets$", r"^/api/spaces$", @@ -659,6 +697,9 @@ STRICT_ALLOWLIST_PATTERNS = [ r"^/api/whoami-v2$", r"^/api/trending$", r"^/api/daily_papers$", + r"^/api/papers/search$", + r"^/api/papers/[^/]+$", + r"^/papers/[^/]+\.md$", r"^/api/(models|datasets|spaces)/(?:[^/]+|[^/]+/[^/]+)/likers$", r"^/api/collections$", r"^/api/collections/[^/]+$", diff --git a/monty_api/runtime_context.py b/monty_api/runtime_context.py index c01dde73e1ce79f0903122051a5f39d8a84b1b8d..afaf0e6eb23f122596ce49baa7e83352337e5a76 100644 --- a/monty_api/runtime_context.py +++ b/monty_api/runtime_context.py @@ -1,13 +1,18 @@ from __future__ import annotations import os +import sys +import time from dataclasses import dataclass, field from typing import TYPE_CHECKING, Any, Callable, NamedTuple, cast +import httpx + from .constants import MAX_CALLS_LIMIT from .helpers.activity import register_activity_helpers from .helpers.collections import register_collection_helpers from .helpers.introspection import register_introspection_helpers +from .helpers.papers import register_paper_helpers from .helpers.profiles import register_profile_helpers from .helpers.repos import register_repo_helpers from .http_runtime import ( @@ -22,7 +27,6 @@ from .http_runtime import ( _extract_profile_name, _load_token, _normalize_collection_repo_item, - _normalize_daily_paper_row, _normalize_repo_detail_row, _normalize_repo_search_row, _normalize_repo_sort_key, @@ -60,7 +64,6 @@ from .runtime_filtering import ( _project_collection_items, _project_discussion_detail_items, _project_discussion_items, - _project_daily_paper_items, _project_items, _project_repo_items, _project_user_items, @@ -82,6 +85,48 @@ class RuntimeHelperEnvironment(NamedTuple): helper_functions: dict[str, Callable[..., Any]] +def _execution_debug_enabled() -> bool: + for key in ("MONTY_DEBUG_EXECUTION", "MONTY_DEBUG_QUERY"): + value = os.environ.get(key, "") + if value.strip().lower() in {"1", "true", "yes", "on"}: + return True + return False + + +def _debug_log(*parts: Any) -> None: + if not _execution_debug_enabled(): + return + print("[monty-debug]", *parts, file=sys.stderr) + sys.stderr.flush() + + +def _hf_call_timeout_default() -> int: + raw = os.environ.get("MONTY_HF_CALL_TIMEOUT_SEC", "20").strip() + try: + return max(1, int(raw)) + except Exception: + return 20 + + +_HF_CLIENT_TIMEOUT_SEC: int | None = None + + +def _configure_hf_client_factory(timeout_sec: int) -> None: + global _HF_CLIENT_TIMEOUT_SEC + if _HF_CLIENT_TIMEOUT_SEC == timeout_sec: + return + from huggingface_hub.utils._http import hf_request_event_hook, set_client_factory + + set_client_factory( + lambda: httpx.Client( + event_hooks={"request": [hf_request_event_hook]}, + follow_redirects=True, + timeout=float(timeout_sec), + ) + ) + _HF_CLIENT_TIMEOUT_SEC = timeout_sec + + @dataclass(slots=True) class RuntimeContext: max_calls: int @@ -153,6 +198,8 @@ class RuntimeContext: json_body: dict[str, Any] | None = None, ) -> dict[str, Any]: idx = self._consume_call(endpoint, method) + started = time.perf_counter() + _debug_log("host_raw:start", f"call={idx}", method, endpoint) try: resp = call_api_host( endpoint, @@ -174,9 +221,29 @@ class RuntimeContext: method=method, status=int(resp.get("status") or 0), ) + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log( + "host_raw:end", + f"call={idx}", + method, + endpoint, + f"ok={bool(resp.get('ok'))}", + f"status={resp.get('status')}", + f"elapsed_ms={elapsed_ms}", + ) return resp except Exception as exc: self._trace_err(idx, endpoint, exc, method=method, status=0) + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log( + "host_raw:error", + f"call={idx}", + method, + endpoint, + type(exc).__name__, + str(exc), + f"elapsed_ms={elapsed_ms}", + ) raise def _get_hf_api_client(self) -> "HfApi": @@ -184,24 +251,75 @@ class RuntimeContext: from huggingface_hub import HfApi endpoint = os.getenv("HF_ENDPOINT", "https://huggingface.co").rstrip("/") + _configure_hf_client_factory( + max(1, min(self.timeout_sec, _hf_call_timeout_default())) + ) self._hf_api_client = HfApi(endpoint=endpoint, token=_load_token()) return self._hf_api_client def _host_hf_call(self, endpoint: str, fn: Callable[[], Any]) -> Any: idx = self._consume_call(endpoint, "GET") + started = time.perf_counter() + timeout_sec = max(1, min(self.timeout_sec, _hf_call_timeout_default())) + _debug_log( + "host_hf:start", + f"call={idx}", + endpoint, + f"timeout_sec={timeout_sec}", + ) try: out = fn() self._trace_ok(idx, endpoint, method="GET", status=200) + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log("host_hf:end", f"call={idx}", endpoint, f"elapsed_ms={elapsed_ms}") return out except Exception as exc: self._trace_err(idx, endpoint, exc, method="GET", status=0) + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log( + "host_hf:error", + f"call={idx}", + endpoint, + type(exc).__name__, + str(exc), + f"elapsed_ms={elapsed_ms}", + ) raise async def call_helper(self, helper_name: str, /, *args: Any, **kwargs: Any) -> Any: fn = self.helper_registry.get(helper_name) if not callable(fn): raise RuntimeError(f"Helper '{helper_name}' is not registered") - return await cast(Callable[..., Any], fn)(*args, **kwargs) + started = time.perf_counter() + _debug_log( + "runtime_helper:start", + helper_name, + f"args={len(args)}", + f"kwargs={sorted(kwargs)}", + f"budget_remaining={self._budget_remaining()}", + ) + try: + result = await cast(Callable[..., Any], fn)(*args, **kwargs) + except Exception as exc: + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + _debug_log( + "runtime_helper:error", + helper_name, + type(exc).__name__, + str(exc), + f"elapsed_ms={elapsed_ms}", + ) + raise + elapsed_ms = round((time.perf_counter() - started) * 1000, 2) + ok = result.get("ok") if isinstance(result, dict) else None + _debug_log( + "runtime_helper:end", + helper_name, + f"ok={ok}", + f"elapsed_ms={elapsed_ms}", + f"budget_remaining={self._budget_remaining()}", + ) + return result for name, value in { @@ -222,7 +340,6 @@ for name, value in { "_project_collection_items": _project_collection_items, "_project_discussion_items": _project_discussion_items, "_project_discussion_detail_items": _project_discussion_detail_items, - "_project_daily_paper_items": _project_daily_paper_items, "_project_user_items": _project_user_items, "_project_actor_items": _project_actor_items, "_project_user_like_items": _project_user_like_items, @@ -243,7 +360,6 @@ for name, value in { "_extract_profile_name": staticmethod(_extract_profile_name), "_load_token": staticmethod(_load_token), "_normalize_collection_repo_item": staticmethod(_normalize_collection_repo_item), - "_normalize_daily_paper_row": staticmethod(_normalize_daily_paper_row), "_normalize_repo_detail_row": staticmethod(_normalize_repo_detail_row), "_normalize_repo_search_row": staticmethod(_normalize_repo_search_row), "_normalize_repo_sort_key": staticmethod(_normalize_repo_sort_key), @@ -272,6 +388,7 @@ def build_runtime_helper_environment( for registration in ( register_profile_helpers, register_repo_helpers, + register_paper_helpers, register_activity_helpers, register_collection_helpers, register_introspection_helpers, diff --git a/monty_api/runtime_filtering.py b/monty_api/runtime_filtering.py index 157adbde6c95ab289e206a0e2d6cf9e341b3cc8d..665afbe1af3c0e783414ecd23ddfd2dcc478266d 100644 --- a/monty_api/runtime_filtering.py +++ b/monty_api/runtime_filtering.py @@ -1,12 +1,12 @@ from __future__ import annotations +from datetime import UTC, datetime from typing import Any from .constants import ( ACTIVITY_CANONICAL_FIELDS, ACTOR_CANONICAL_FIELDS, COLLECTION_CANONICAL_FIELDS, - DAILY_PAPER_CANONICAL_FIELDS, DISCUSSION_CANONICAL_FIELDS, DISCUSSION_DETAIL_CANONICAL_FIELDS, REPO_CANONICAL_FIELDS, @@ -16,6 +16,23 @@ from .constants import ( from .http_runtime import _as_int +def _as_datetime(value: Any) -> datetime | None: + if not isinstance(value, str): + return None + text = value.strip() + if not text: + return None + if text.endswith("Z"): + text = f"{text[:-1]}+00:00" + try: + parsed = datetime.fromisoformat(text) + except Exception: + return None + if parsed.tzinfo is None: + return parsed.replace(tzinfo=UTC) + return parsed + + def _allowed_field_set(allowed_fields: tuple[str, ...] | list[str] | set[str]) -> set[str]: return {str(field).strip() for field in allowed_fields if str(field).strip()} @@ -65,14 +82,6 @@ def _project_collection_items( ) -def _project_daily_paper_items( - self: Any, items: list[dict[str, Any]], fields: list[str] | None -) -> list[dict[str, Any]]: - return _project_items( - self, items, fields, allowed_fields=DAILY_PAPER_CANONICAL_FIELDS - ) - - def _project_user_items( self: Any, items: list[dict[str, Any]], fields: list[str] | None ) -> list[dict[str, Any]]: @@ -172,13 +181,25 @@ def _item_matches_where( if "gte" in cond: left = _as_int(value) right = _as_int(cond.get("gte")) - if left is None or right is None or left < right: - return False + if left is not None and right is not None: + if left < right: + return False + else: + left_dt = _as_datetime(value) + right_dt = _as_datetime(cond.get("gte")) + if left_dt is None or right_dt is None or left_dt < right_dt: + return False if "lte" in cond: left = _as_int(value) right = _as_int(cond.get("lte")) - if left is None or right is None or left > right: - return False + if left is not None and right is not None: + if left > right: + return False + else: + left_dt = _as_datetime(value) + right_dt = _as_datetime(cond.get("lte")) + if left_dt is None or right_dt is None or left_dt > right_dt: + return False continue if isinstance(cond, (list, tuple, set)): if value not in cond: diff --git a/monty_api/tool_entrypoints.py b/monty_api/tool_entrypoints.py index 69f61f3c7367843959999851e2764012317f50a3..e57fb81d0bc0c1815df8975350531e141597b7f5 100644 --- a/monty_api/tool_entrypoints.py +++ b/monty_api/tool_entrypoints.py @@ -23,28 +23,28 @@ from monty_api import ( # noqa: E402 async def hf_hub_query( - query: str, code: str, + query: str | None = None, max_calls: int | None = None, timeout_sec: int | None = None, ) -> dict[str, Any]: return await _hf_hub_query( - query=query, code=code, + query=query, max_calls=max_calls, timeout_sec=timeout_sec, ) async def hf_hub_query_raw( - query: str, code: str, + query: str | None = None, max_calls: int | None = None, timeout_sec: int | None = None, ) -> Any: return await _hf_hub_query_raw( - query=query, code=code, + query=query, max_calls=max_calls, timeout_sec=timeout_sec, ) diff --git a/monty_api/validation.py b/monty_api/validation.py index 9295a88279cb31c1409ee6b46e6ccb98a320ce8a..487974c1265c46cbd1754ecc8f33a25ed850b0a3 100644 --- a/monty_api/validation.py +++ b/monty_api/validation.py @@ -1,6 +1,7 @@ from __future__ import annotations import ast +import os import re import tokenize from io import StringIO @@ -119,6 +120,87 @@ def _truncate_result_payload(output: Any) -> Any: return trimmed +def _verbose_result_meta_enabled() -> bool: + value = os.environ.get("MONTY_VERBOSE_RESULT_META", "") + return value.strip().lower() in {"1", "true", "yes", "on"} + + +def _is_helper_meta_dict(value: Any) -> bool: + return ( + isinstance(value, dict) + and isinstance(value.get("source"), str) + and ( + value.get("normalized") is True + or "budget_used" in value + or "budget_remaining" in value + ) + ) + + +def _helper_meta_is_partial(value: dict[str, Any]) -> bool: + return any( + [ + value.get("truncated") is True, + value.get("more_available") not in {False, None}, + value.get("limit_boundary_hit") is True, + value.get("sample_complete") is False, + value.get("exact_count") is False, + value.get("ranking_complete") is False, + value.get("ranking_window_hit") is True, + value.get("hard_cap_applied") is True, + ] + ) + + +def _compact_helper_meta(value: dict[str, Any]) -> dict[str, Any]: + partial = _helper_meta_is_partial(value) + compact: dict[str, Any] = { + "partial": partial, + } + for key in ( + "source", + "returned", + "total", + "matched", + "more_available", + "truncated", + "truncated_by", + "exact_count", + "sample_complete", + "hard_cap_applied", + "limit_boundary_hit", + "can_request_more", + "next_request_hint", + "ranking_window", + "ranking_window_hit", + "ranking_complete", + "ranking_next_request_hint", + "relation", + "username", + "organization", + "entity", + "entity_type", + "handle", + ): + if value.get(key) is not None: + compact[key] = value.get(key) + if compact.get("total") is None and value.get("total_available") is not None: + compact["total"] = value.get("total_available") + return compact + + +def _compact_result_metadata(value: Any) -> Any: + if _verbose_result_meta_enabled(): + return value + if _is_helper_meta_dict(value): + return _compact_helper_meta(value) + if isinstance(value, dict): + return {key: _compact_result_metadata(item) for key, item in value.items()} + if isinstance(value, list): + return [_compact_result_metadata(item) for item in value] + return value + + def _is_helper_envelope(output: Any) -> bool: return ( isinstance(output, dict) @@ -139,8 +221,7 @@ def _summarize_limit_hit(helper_name: str, result: Any) -> dict[str, Any] | None truncated_by = str(meta.get("truncated_by") or "") limit_hit = any( [ - meta.get("truncated") is True, - meta.get("hard_cap_applied") is True, + _helper_meta_is_partial(meta), truncated_by in {"scan_limit", "page_limit", "multiple"}, ] ) @@ -233,54 +314,27 @@ def _validate_generated_code(code: str) -> None: if not isinstance(parsed, ast.Module): raise ValueError("Generated code must be a Python module") - - solve_defs = [ - node - for node in parsed.body - if isinstance(node, ast.AsyncFunctionDef) and node.name == "solve" - ] - if not solve_defs: - raise ValueError( - "Generated code must define `async def solve(query, max_calls): ...`." - ) - - def _valid_solve_signature(node: ast.AsyncFunctionDef) -> bool: - args = node.args - return ( - not args.posonlyargs - and len(args.args) == 2 - and [arg.arg for arg in args.args] == ["query", "max_calls"] - and args.vararg is None - and not args.kwonlyargs - and args.kwarg is None - and not args.defaults - and not args.kw_defaults - ) - - if not any(_valid_solve_signature(node) for node in solve_defs): - raise ValueError( - "`solve` must have signature `async def solve(query, max_calls): ...`." - ) - if not parsed.body: raise ValueError("Generated code is empty") final_stmt = parsed.body[-1] - valid_final_await = ( + final_is_result = ( isinstance(final_stmt, ast.Expr) - and isinstance(final_stmt.value, ast.Await) - and isinstance(final_stmt.value.value, ast.Call) - and isinstance(final_stmt.value.value.func, ast.Name) - and final_stmt.value.value.func.id == "solve" - and len(final_stmt.value.value.args) == 2 - and not final_stmt.value.value.keywords - and all(isinstance(arg, ast.Name) for arg in final_stmt.value.value.args) - and [cast(ast.Name, arg).id for arg in final_stmt.value.value.args] - == ["query", "max_calls"] + and isinstance(final_stmt.value, ast.Name) + and final_stmt.value.id == "result" + ) + if not final_is_result: + raise ValueError( + "Generated code must assign the final output to `result` and end with a final line containing only `result` (do not stop after `result = ...`)." + ) + + has_result_assignment = any( + isinstance(node, ast.Name) and isinstance(node.ctx, ast.Store) and node.id == "result" + for node in ast.walk(parsed) ) - if not valid_final_await: + if not has_result_assignment: raise ValueError( - "Generated code must end with `await solve(query, max_calls)`." + "Generated code must assign the final output to `result` before the final `result` line." ) for node in ast.walk(parsed): diff --git a/tool_entrypoints.py b/tool_entrypoints.py index 69f61f3c7367843959999851e2764012317f50a3..e57fb81d0bc0c1815df8975350531e141597b7f5 100644 --- a/tool_entrypoints.py +++ b/tool_entrypoints.py @@ -23,28 +23,28 @@ from monty_api import ( # noqa: E402 async def hf_hub_query( - query: str, code: str, + query: str | None = None, max_calls: int | None = None, timeout_sec: int | None = None, ) -> dict[str, Any]: return await _hf_hub_query( - query=query, code=code, + query=query, max_calls=max_calls, timeout_sec=timeout_sec, ) async def hf_hub_query_raw( - query: str, code: str, + query: str | None = None, max_calls: int | None = None, timeout_sec: int | None = None, ) -> Any: return await _hf_hub_query_raw( - query=query, code=code, + query=query, max_calls=max_calls, timeout_sec=timeout_sec, ) diff --git a/wheels/pydantic_monty-0.0.7-cp313-cp313-manylinux_2_34_x86_64.whl b/wheels/pydantic_monty-0.0.7-cp313-cp313-manylinux_2_34_x86_64.whl deleted file mode 100644 index 8cd0c870174d8124cfd8be24faa39f2a39af99d1..0000000000000000000000000000000000000000 --- a/wheels/pydantic_monty-0.0.7-cp313-cp313-manylinux_2_34_x86_64.whl +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:c7f75a0068bf46d7ef70d29d5cecd94e1a263dbde4c0968ae656ee78d3146c58 -size 6899940 diff --git a/wheels/pydantic_monty-0.0.8-cp313-cp313-manylinux_2_35_x86_64.whl b/wheels/pydantic_monty-0.0.8-cp313-cp313-manylinux_2_35_x86_64.whl deleted file mode 100644 index abe8959d6b61716c5f1947f0b8e296404a4383a2..0000000000000000000000000000000000000000 --- a/wheels/pydantic_monty-0.0.8-cp313-cp313-manylinux_2_35_x86_64.whl +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:4e71b01c5210e7955a0e172c9fec1295b545dff71a95d367e900f0096b037ad2 -size 6996093