evalstate HF Staff commited on
Commit
b487777
·
verified ·
1 Parent(s): 5a0e824

Deploy hf_paper_search MCP server

Browse files
Files changed (4) hide show
  1. Dockerfile +26 -0
  2. README.md +26 -6
  3. hf_paper_search.md +43 -0
  4. hf_papers_tool.py +185 -0
Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.13-slim
2
+
3
+ RUN apt-get update && \
4
+ apt-get install -y \
5
+ bash \
6
+ git git-lfs \
7
+ wget curl procps \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
11
+
12
+ WORKDIR /app
13
+ RUN uv pip install --system --no-cache fast-agent-mcp
14
+
15
+ COPY --link ./ /app
16
+ RUN chown -R 1000:1000 /app
17
+ USER 1000
18
+
19
+ EXPOSE 7860
20
+
21
+ CMD ["fast-agent", "serve", \
22
+ "--card", "hf_paper_search.md", \
23
+ "--transport", "http", \
24
+ "--instance-scope", "request", \
25
+ "--host", "0.0.0.0", \
26
+ "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,30 @@
1
  ---
2
- title: Hf Papers
3
- emoji: 💻
4
- colorFrom: yellow
5
- colorTo: purple
6
  sdk: docker
7
- pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: HF Papers Search
3
+ emoji: 📚
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: docker
7
+ app_port: 7860
8
+ short_description: Fast-agent MCP server for Hugging Face Daily Papers search
9
  ---
10
 
11
+ # HF Papers Search (MCP)
12
+
13
+ This Space runs [fast-agent](https://fast-agent.ai/) as an MCP server to provide
14
+ specialized search over the Hugging Face Daily Papers feed.
15
+
16
+ ## Features
17
+ - Query `/api/daily_papers` with date/week/month filters
18
+ - Local keyword filtering across titles, summaries, authors, AI keywords
19
+ - Token passthrough via Hugging Face OAuth or Bearer tokens
20
+
21
+ ## Environment Variables
22
+ Set these in Space settings:
23
+ - `FAST_AGENT_SERVE_OAUTH=hf`
24
+ - `FAST_AGENT_OAUTH_SCOPES=inference-api`
25
+ - `FAST_AGENT_OAUTH_RESOURCE_URL=https://evalstate-hf-papers.hf.space`
26
+ - `HF_TOKEN=hf_dummy` (dummy token required at startup)
27
+ - `OPENAI_API_KEY=DUMMY` (per request, your clients can override)
28
+
29
+ ## Usage
30
+ Once running, the agent is available via HTTP at the Space URL.
hf_paper_search.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ type: agent
3
+ name: hf-papers-search
4
+ function_tools:
5
+ - hf_papers_tool.py:hf_papers_search
6
+ model: gpt-oss
7
+ description: "Search Hugging Face Daily Papers with local keyword filtering, date/week/month selectors, and trending/published sort. Returns structured paper entries from /api/daily_papers."
8
+ ---
9
+ Hugging Face Daily Papers Search
10
+ ================================
11
+
12
+ Use this tool when you need a specialized paper search against the Hugging Face
13
+ Daily Papers feed. It queries `/api/daily_papers` and applies optional local
14
+ keyword filtering across titles, summaries, authors, AI summaries, keywords,
15
+ project pages, GitHub repos, and paper ids (arXiv ids).
16
+
17
+ Tool
18
+ ----
19
+ `hf_papers_search(query: str | None, *, date, week, month, submitter, sort, limit, page, max_pages, api_limit)`
20
+
21
+ Parameters
22
+ ----------
23
+ - `query`: Keyword search (case-insensitive). Multiple tokens are ANDed.
24
+ - `date`: ISO date `YYYY-MM-DD`.
25
+ - `week`: ISO week `YYYY-Www`.
26
+ - `month`: ISO month `YYYY-MM`.
27
+ - `submitter`: HF username of the submitter.
28
+ - `sort`: `publishedAt` or `trending`.
29
+ - `limit`: Max results to return after filtering (default 20).
30
+ - `page`: API page index (default 0).
31
+ - `max_pages`: How many pages to fetch for local filtering (default 1).
32
+ - `api_limit`: Page size for the API (default 50, max 100).
33
+
34
+ Examples
35
+ --------
36
+ - Latest papers (first page):
37
+ `hf_papers_search()`
38
+
39
+ - Search for "diffusion" in the past week, up to 40 results, across 3 pages:
40
+ `hf_papers_search("diffusion", week="2026-W03", limit=40, max_pages=3)`
41
+
42
+ - Trending papers this month tagged by query term:
43
+ `hf_papers_search("alignment", month="2026-01", sort="trending")`
hf_papers_tool.py ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import os
5
+ import re
6
+ from pathlib import Path
7
+ from typing import Any
8
+ from urllib.error import HTTPError, URLError
9
+ from urllib.parse import urlencode
10
+ from urllib.request import Request, urlopen
11
+
12
+ DEFAULT_LIMIT = 20
13
+ DEFAULT_TIMEOUT_SEC = 30
14
+ MAX_API_LIMIT = 100
15
+
16
+
17
+ def _load_token() -> str | None:
18
+ # Check for request-scoped token first (when running as MCP server)
19
+ try:
20
+ from fast_agent.mcp.auth.context import request_bearer_token
21
+
22
+ ctx_token = request_bearer_token.get()
23
+ if ctx_token:
24
+ return ctx_token
25
+ except ImportError:
26
+ pass
27
+
28
+ token = os.getenv("HF_TOKEN")
29
+ if token:
30
+ return token
31
+
32
+ token_path = Path.home() / ".cache" / "huggingface" / "token"
33
+ if token_path.exists():
34
+ token_value = token_path.read_text(encoding="utf-8").strip()
35
+ return token_value or None
36
+
37
+ return None
38
+
39
+
40
+ def _normalize_date_param(value: str | None) -> str | None:
41
+ if not value:
42
+ return None
43
+ return value.strip()
44
+
45
+
46
+ def _build_url(params: dict[str, Any]) -> str:
47
+ base = os.getenv("HF_ENDPOINT", "https://huggingface.co").rstrip("/")
48
+ query = urlencode({k: v for k, v in params.items() if v is not None}, doseq=True)
49
+ return f"{base}/api/daily_papers?{query}" if query else f"{base}/api/daily_papers"
50
+
51
+
52
+ def _request_json(url: str) -> list[dict[str, Any]]:
53
+ headers = {"Accept": "application/json"}
54
+ token = _load_token()
55
+ if token:
56
+ headers["Authorization"] = f"Bearer {token}"
57
+
58
+ request = Request(url, headers=headers, method="GET")
59
+ try:
60
+ with urlopen(request, timeout=DEFAULT_TIMEOUT_SEC) as response:
61
+ raw = response.read()
62
+ except HTTPError as exc:
63
+ error_body = exc.read().decode("utf-8", errors="replace")
64
+ raise RuntimeError(f"HF API error {exc.code} for {url}: {error_body}") from exc
65
+ except URLError as exc:
66
+ raise RuntimeError(f"HF API request failed for {url}: {exc}") from exc
67
+
68
+ payload = json.loads(raw)
69
+ if not isinstance(payload, list):
70
+ raise RuntimeError("Unexpected response shape from /api/daily_papers")
71
+ return payload
72
+
73
+
74
+ def _extract_search_blob(item: dict[str, Any]) -> str:
75
+ paper = item.get("paper") or {}
76
+ authors = paper.get("authors") or []
77
+ author_names = [a.get("name", "") for a in authors if isinstance(a, dict)]
78
+
79
+ ai_keywords = paper.get("ai_keywords") or []
80
+ if isinstance(ai_keywords, list):
81
+ ai_keywords_text = " ".join(str(k) for k in ai_keywords)
82
+ else:
83
+ ai_keywords_text = str(ai_keywords)
84
+
85
+ parts = [
86
+ item.get("title"),
87
+ item.get("summary"),
88
+ paper.get("title"),
89
+ paper.get("summary"),
90
+ paper.get("ai_summary"),
91
+ ai_keywords_text,
92
+ " ".join(author_names),
93
+ paper.get("id"),
94
+ paper.get("projectPage"),
95
+ paper.get("githubRepo"),
96
+ ]
97
+
98
+ text = " ".join(str(part) for part in parts if part)
99
+ return text.lower()
100
+
101
+
102
+ def _matches_query(item: dict[str, Any], query: str) -> bool:
103
+ tokens = [t for t in re.split(r"\s+", query.strip().lower()) if t]
104
+ if not tokens:
105
+ return True
106
+ haystack = _extract_search_blob(item)
107
+ return all(token in haystack for token in tokens)
108
+
109
+
110
+ def hf_papers_search(
111
+ query: str | None = None,
112
+ *,
113
+ date: str | None = None,
114
+ week: str | None = None,
115
+ month: str | None = None,
116
+ submitter: str | None = None,
117
+ sort: str | None = None,
118
+ limit: int | None = None,
119
+ page: int | None = None,
120
+ max_pages: int | None = None,
121
+ api_limit: int | None = None,
122
+ ) -> dict[str, Any]:
123
+ """
124
+ Search Hugging Face Daily Papers with optional local filtering.
125
+
126
+ Args:
127
+ query: Case-insensitive keyword search across title, summary, authors,
128
+ AI summary/keywords, project page, repo link, and paper id.
129
+ date: ISO date (YYYY-MM-DD).
130
+ week: ISO week (YYYY-Www).
131
+ month: ISO month (YYYY-MM).
132
+ submitter: HF username of the submitter.
133
+ sort: "publishedAt" or "trending".
134
+ limit: Max results to return after filtering (default 20).
135
+ page: Page index for the API (default 0).
136
+ max_pages: Number of pages to fetch for local filtering (default 1).
137
+ api_limit: Page size for the API (default 50, max 100).
138
+
139
+ Returns:
140
+ dict with query metadata and list of daily paper entries.
141
+ """
142
+ resolved_limit = DEFAULT_LIMIT if limit is None else max(int(limit), 1)
143
+ start_page = max(int(page or 0), 0)
144
+ pages_to_fetch = max(int(max_pages or 1), 1)
145
+
146
+ per_page = 50 if api_limit is None else max(int(api_limit), 1)
147
+ per_page = min(per_page, MAX_API_LIMIT)
148
+
149
+ params_base: dict[str, Any] = {
150
+ "date": _normalize_date_param(date),
151
+ "week": _normalize_date_param(week),
152
+ "month": _normalize_date_param(month),
153
+ "submitter": submitter.strip() if submitter else None,
154
+ "sort": sort.strip() if sort else None,
155
+ "limit": per_page,
156
+ }
157
+
158
+ results: list[dict[str, Any]] = []
159
+ pages_fetched = 0
160
+ for page_index in range(start_page, start_page + pages_to_fetch):
161
+ params = {**params_base, "p": page_index}
162
+ url = _build_url(params)
163
+ payload = _request_json(url)
164
+ pages_fetched += 1
165
+
166
+ if query:
167
+ filtered = [item for item in payload if _matches_query(item, query)]
168
+ else:
169
+ filtered = payload
170
+
171
+ results.extend(filtered)
172
+ if len(results) >= resolved_limit:
173
+ break
174
+
175
+ return {
176
+ "query": query,
177
+ "params": {
178
+ **{k: v for k, v in params_base.items() if v is not None},
179
+ "page": start_page,
180
+ "max_pages": pages_fetched,
181
+ "api_limit": per_page,
182
+ },
183
+ "returned": min(len(results), resolved_limit),
184
+ "data": results[:resolved_limit],
185
+ }