OpeneR Sisyphus commited on
Commit
778278c
·
0 Parent(s):

HydraDeck open-source clean snapshot

Browse files

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ .pytest_cache/
4
+ .ruff_cache/
5
+ .DS_Store
6
+ build/
7
+ *.egg-info/
8
+ out/
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # hydradeck
2
+
3
+ 一个可重复、可审计的 Grok Deep Research 流水线(多 Persona 迭代),输出:
4
+
5
+ - `pre_report.md`:Pre-Research(研究前置)报告:研究问题拆解、方法、检索策略、风险与边界
6
+ - `report.md`:完整研究报告(含“完整资源”列表与可追溯引用)
7
+ - `speech.md`:演讲稿(可直接照读,含转场与时间提示)
8
+ - `pre_paper.tex`:Pre-brief 的 LaTeX 论文稿(article)
9
+ - `pre_slides.tex`:Pre-brief 的 Beamer 幻灯片
10
+ - `refs.bib`:BibTeX 参考文献
11
+ - `research.json`:结构化中间产物(便于复现与审计)
12
+
13
+ > 安全提示:不要把 API Key 写进仓库。请使用环境变量 `GROK_API_KEY`。
14
+ > 如果你已经在聊天里粘贴过 key,请立即**轮换/作废**该 key。
15
+
16
+ ## 安装
17
+
18
+ ```bash
19
+ cd hydradeck
20
+ python3 -m pip install -e .
21
+ python3 -m pip install -e ".[dev]"
22
+ ```
23
+
24
+ ## 快速使用
25
+
26
+ ### 1) Mock(离线)跑通流程
27
+
28
+ ```bash
29
+ mkdir -p out
30
+ hydradeck run --topic "LLM agents for deep research" --out out/demo.zip --mock
31
+ ```
32
+
33
+ ### 2) 使用 Grok2API / OpenAI 兼容网关
34
+
35
+ `api.example.com` 基于 Grok2API,提供 OpenAI 兼容的 `/v1/chat/completions` 与 `/v1/models`。
36
+
37
+ ```bash
38
+ export GROK_BASE_URL="https://api.example.com"
39
+ export GROK_API_KEY="<YOUR_KEY>"
40
+ export GROK_MODEL="grok-4"
41
+
42
+ mkdir -p out
43
+ hydradeck run --topic "<你的研究主题>" --out out/topic.zip \
44
+ --iterations 3 \
45
+ --max-sources 10
46
+ ```
47
+
48
+ ## 输出结构
49
+
50
+ 输出为一个目录或 zip(取决于 `--out` 是否以 `.zip` 结尾)。其中包含 `compile.sh` 与 `Makefile` 便于编译 LaTeX。
51
+
52
+ ## WebUI(HydraDeck)
53
+
54
+ ### 启动方式(本地)
55
+
56
+ ```bash
57
+ cd hydradeck
58
+ python3 custom_web.py
59
+ ```
60
+
61
+ 默认监听:`http://127.0.0.1:7861`
62
+
63
+ ### 运行前环境变量(可选)
64
+
65
+ ```bash
66
+ export GROK_BASE_URL="https://api.example.com"
67
+ export GROK_API_KEY="<YOUR_KEY>"
68
+ export GROK_MODEL="grok-4"
69
+ ```
70
+
71
+ ### 页面基本使用
72
+
73
+ 1. 在 `Run` 标签填写 Topic
74
+ 2. 点 `Quick API Check` 先检查连通性
75
+ 3. 点 `Run HydraDeck` 开始生成
76
+ 4. 在 `Console` 查看实时进度
77
+ 5. 在 `Artifacts` 下载 `paper.pdf` / `slides.pdf`
README_SPACES.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: hydradeck-webui
3
+ emoji: 📚
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # hydradeck WebUI (Hugging Face Spaces)
13
+
14
+ Set these secrets in Space settings if needed:
15
+
16
+ - `GROK_API_KEY`
17
+ - `GROK_BASE_URL` (optional, defaults to `https://api.example.com`)
18
+ - `GROK_MODEL` (optional, defaults to `grok-4`)
19
+
20
+ The app entrypoint is `app.py`.
app.py ADDED
@@ -0,0 +1,1269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import warnings
4
+
5
+ warnings.filterwarnings(
6
+ "ignore",
7
+ message=r"urllib3 v2 only supports OpenSSL 1\.1\.1\+.*",
8
+ )
9
+
10
+ import tempfile
11
+ import zipfile
12
+ import json
13
+ import time
14
+ from concurrent.futures import ThreadPoolExecutor
15
+ from queue import Empty, Queue
16
+ from pathlib import Path
17
+ from typing import Any
18
+ from urllib.error import HTTPError, URLError
19
+ from urllib.parse import quote, urlparse
20
+ from urllib.request import Request, urlopen
21
+
22
+ import gradio as gr
23
+
24
+ from hydradeck.clients import ChatMessage, GrokClient
25
+ from hydradeck.config import resolve_api_key, resolve_base_url, resolve_model
26
+ from hydradeck.core.types import RunConfig
27
+ from hydradeck.pipeline import run
28
+ from hydradeck.render import (
29
+ build_slide_frames_from_sections,
30
+ enforce_slide_density,
31
+ render_beamer_frames,
32
+ render_paper,
33
+ render_report_structured,
34
+ )
35
+
36
+
37
+ CHROME_144_UA = (
38
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
39
+ "AppleWebKit/537.36 (KHTML, like Gecko) "
40
+ "Chrome/144.0.0.0 Safari/537.36"
41
+ )
42
+
43
+ def _normalized_base_url(base_url: str) -> str:
44
+ parsed = urlparse(base_url.strip())
45
+ if parsed.scheme not in {"http", "https"}:
46
+ raise ValueError("Base URL must start with http:// or https://")
47
+ if not parsed.netloc:
48
+ raise ValueError("Base URL is missing host")
49
+ return base_url.strip().rstrip("/")
50
+
51
+
52
+ def _preflight_check(base_url: str, api_key: str, request_budget: float) -> str | None:
53
+ if not api_key.strip():
54
+ return "Missing API key. Fill API Key field or set GROK_API_KEY before running."
55
+
56
+ try:
57
+ normalized = _normalized_base_url(base_url)
58
+ except ValueError as exc:
59
+ return f"Invalid Base URL: {exc}"
60
+
61
+ probe_url = f"{normalized}/v1/models"
62
+ timeout_s = max(2.0, min(float(request_budget), 6.0))
63
+ req = Request(
64
+ probe_url,
65
+ headers={
66
+ "Authorization": f"Bearer {api_key.strip()}",
67
+ "User-Agent": CHROME_144_UA,
68
+ },
69
+ )
70
+
71
+ try:
72
+ with urlopen(req, timeout=timeout_s):
73
+ return None
74
+ except HTTPError as exc:
75
+ try:
76
+ body = exc.read().decode("utf-8", errors="replace")
77
+ except Exception:
78
+ body = ""
79
+ if exc.code == 403 and "error code: 1010" in body.lower():
80
+ return (
81
+ "Gateway blocked this client (Cloudflare 1010), not an API-key issue. "
82
+ "Try another network/egress IP or ask gateway admin to allow this IP."
83
+ )
84
+ if exc.code in {401, 403}:
85
+ return "API key rejected (401/403). Please update GROK_API_KEY or paste a valid key."
86
+ return f"API endpoint returned HTTP {exc.code} during preflight."
87
+ except URLError as exc:
88
+ return f"Cannot reach API endpoint ({probe_url}): {exc.reason}"
89
+ except TimeoutError:
90
+ return (
91
+ f"API preflight timed out after {timeout_s:.0f}s. "
92
+ "Try mock mode first, then increase Request budget."
93
+ )
94
+
95
+
96
+ def _api_quick_check(base_url: str, api_key: str, model: str, request_budget: float) -> str:
97
+ selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
98
+ selected_api_key = api_key.strip() or resolve_api_key()
99
+
100
+ preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
101
+ if preflight_error is not None:
102
+ return f"API check failed: {preflight_error}"
103
+
104
+ normalized = _normalized_base_url(selected_base_url)
105
+ req_model = model.strip() or resolve_model("grok-3-mini")
106
+ payload = {
107
+ "model": req_model,
108
+ "messages": [{"role": "user", "content": "reply with exactly: API_OK"}],
109
+ "temperature": 0,
110
+ "max_tokens": 8,
111
+ }
112
+ req = Request(
113
+ f"{normalized}/v1/chat/completions",
114
+ method="POST",
115
+ data=json.dumps(payload).encode("utf-8"),
116
+ headers={
117
+ "Authorization": f"Bearer {selected_api_key.strip()}",
118
+ "User-Agent": CHROME_144_UA,
119
+ "Content-Type": "application/json",
120
+ },
121
+ )
122
+ timeout_s = max(3.0, min(float(request_budget), 12.0))
123
+ try:
124
+ with urlopen(req, timeout=timeout_s) as resp:
125
+ body = resp.read().decode("utf-8", errors="replace")
126
+ except HTTPError as exc:
127
+ text = exc.read().decode("utf-8", errors="replace")
128
+ return f"API check failed: HTTP {exc.code} {text[:180]}"
129
+ except URLError as exc:
130
+ return f"API check failed: network error {exc.reason}"
131
+ except TimeoutError:
132
+ return f"API check failed: completion timeout after {timeout_s:.0f}s"
133
+
134
+ if "API_OK" not in body:
135
+ return f"API check uncertain: completion returned unexpected body: {body[:180]}"
136
+ return "API check passed: models/completions reachable and auth works."
137
+
138
+
139
+ def _compile_latex_online(tex_source: str, output_name: str) -> str:
140
+ def _compile_via_hosted_url(command: str) -> bytes:
141
+ upload_req = Request("https://paste.rs", data=tex_source.encode("utf-8"), method="POST")
142
+ with urlopen(upload_req, timeout=30) as upload_resp:
143
+ hosted_url = upload_resp.read().decode("utf-8", errors="replace").strip()
144
+ compile_from_url = (
145
+ "https://latexonline.cc/compile?url="
146
+ + quote(hosted_url, safe=":/?=&")
147
+ + "&command="
148
+ + command
149
+ + "&force=true"
150
+ )
151
+ req2 = Request(compile_from_url, headers={"User-Agent": CHROME_144_UA})
152
+ with urlopen(req2, timeout=120) as resp2:
153
+ return resp2.read()
154
+
155
+ errors: list[str] = []
156
+ blob = b""
157
+ for command in ["xelatex", "lualatex", "pdflatex"]:
158
+ try:
159
+ encoded = quote(tex_source, safe="")
160
+ compile_url = (
161
+ "https://latexonline.cc/compile?text="
162
+ + encoded
163
+ + "&command="
164
+ + command
165
+ + "&force=true"
166
+ )
167
+ if len(compile_url) > 6000:
168
+ blob = _compile_via_hosted_url(command)
169
+ else:
170
+ req = Request(compile_url, headers={"User-Agent": CHROME_144_UA})
171
+ with urlopen(req, timeout=90) as resp:
172
+ blob = resp.read()
173
+ if blob.startswith(b"%PDF"):
174
+ break
175
+ blob = _compile_via_hosted_url(command)
176
+ if blob.startswith(b"%PDF"):
177
+ break
178
+ errors.append(f"{command}: non-pdf response")
179
+ except HTTPError as exc:
180
+ body = exc.read().decode("utf-8", errors="replace")
181
+ errors.append(f"{command}: HTTP {exc.code} {body[:500]}")
182
+ except Exception as exc:
183
+ errors.append(f"{command}: {exc}")
184
+
185
+ if not blob.startswith(b"%PDF"):
186
+ raise RuntimeError("online renderer failed: " + " | ".join(errors[:3]))
187
+ out_path = Path("/tmp") / output_name
188
+ _ = out_path.write_bytes(blob)
189
+ return str(out_path)
190
+
191
+
192
+ def _extract_json_object(text: str) -> dict[str, Any]:
193
+ raw = text.strip()
194
+ if not raw:
195
+ raise RuntimeError("empty JSON response")
196
+ try:
197
+ parsed = json.loads(raw)
198
+ if isinstance(parsed, dict):
199
+ return parsed
200
+ except json.JSONDecodeError:
201
+ pass
202
+
203
+ start = raw.find("{")
204
+ end = raw.rfind("}")
205
+ if start == -1 or end == -1 or end <= start:
206
+ raise RuntimeError("no JSON object found in response")
207
+ parsed2 = json.loads(raw[start : end + 1])
208
+ if not isinstance(parsed2, dict):
209
+ raise RuntimeError("top-level JSON is not an object")
210
+ return parsed2
211
+
212
+
213
+ def _chat_json_resilient(
214
+ client: GrokClient,
215
+ messages: list[ChatMessage],
216
+ schema_hint: str,
217
+ temperature: float,
218
+ timeout_s: float,
219
+ ) -> dict[str, Any]:
220
+ try:
221
+ obj = client.chat_json(
222
+ messages,
223
+ schema_hint=schema_hint,
224
+ temperature=temperature,
225
+ timeout_s=timeout_s,
226
+ )
227
+ if isinstance(obj, dict):
228
+ return obj
229
+ except Exception:
230
+ pass
231
+
232
+ try:
233
+ text = client.chat_text(messages, temperature=temperature, timeout_s=timeout_s)
234
+ return _extract_json_object(text)
235
+ except Exception:
236
+ return {}
237
+
238
+
239
+ def _build_stage_model_map(
240
+ requested_model: str,
241
+ overrides: dict[str, str] | None = None,
242
+ ) -> dict[str, str]:
243
+ base = requested_model.strip() or resolve_model("grok-3-mini")
244
+ high = base
245
+ if "mini" in base:
246
+ high = base.replace("-mini", "")
247
+ if high == base and base == "grok-3-mini":
248
+ high = "grok-3"
249
+ model_map = {
250
+ "scope": base,
251
+ "structure": high,
252
+ "planner": high,
253
+ "section": base,
254
+ "paper": high,
255
+ "slides": high,
256
+ }
257
+ if overrides:
258
+ for key in model_map:
259
+ v = overrides.get(key, "").strip()
260
+ if v:
261
+ model_map[key] = v
262
+ return model_map
263
+
264
+
265
+ def _looks_like_template_text(text: str) -> bool:
266
+ low = text.lower().strip()
267
+ if not low:
268
+ return True
269
+ bad_markers = [
270
+ "this section is generated",
271
+ "no content generated",
272
+ "lorem ipsum",
273
+ "to be filled",
274
+ "placeholder",
275
+ "add key evidence-backed findings",
276
+ "补充关键事实与证据",
277
+ ]
278
+ return any(m in low for m in bad_markers)
279
+
280
+
281
+ def _assert_not_template_output(module_name: str, text: str) -> None:
282
+ if _looks_like_template_text(text):
283
+ raise RuntimeError(f"{module_name} produced template-like content; retry required")
284
+
285
+
286
+ def _section_quality_ok(section_title: str, latex_body: str, language: str) -> bool:
287
+ if _looks_like_template_text(latex_body):
288
+ return False
289
+ body = latex_body.strip()
290
+ if len(body) < 120:
291
+ return False
292
+ if language == "zh":
293
+ zh_chars = sum(1 for ch in body if "\u4e00" <= ch <= "\u9fff")
294
+ if zh_chars < 20:
295
+ return False
296
+ else:
297
+ words = [w for w in body.replace("\n", " ").split(" ") if w]
298
+ if len(words) < 40:
299
+ return False
300
+ _ = section_title
301
+ return True
302
+
303
+
304
+ def _run_agentic_pipeline(
305
+ topic: str,
306
+ model: str,
307
+ base_url: str,
308
+ api_key: str,
309
+ request_budget: float,
310
+ use_mock: bool,
311
+ progress: gr.Progress = gr.Progress(),
312
+ stage_callback=None,
313
+ language: str = "en",
314
+ stage_models: dict[str, str] | None = None,
315
+ ) -> tuple[str, str, str, str, str, str, str, str, str]:
316
+ if not topic.strip():
317
+ return "Topic is required.", "", "", "", "", "", "", "", ""
318
+
319
+ selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
320
+ selected_api_key = api_key.strip() or resolve_api_key()
321
+ selected_model = model.strip() or resolve_model("grok-3-mini")
322
+ lang = language.strip().lower()
323
+ if lang not in {"en", "zh"}:
324
+ lang = "en"
325
+ model_map = _build_stage_model_map(selected_model, overrides=stage_models)
326
+ total_steps = 9
327
+ stage_logs: list[str] = []
328
+
329
+ def mark(step: int, label: str, detail: str) -> None:
330
+ pct = min(max(step / total_steps, 0.0), 1.0)
331
+ _ = progress(pct, desc=label)
332
+ stage_logs.append(f"{step}/{total_steps} {label}: {detail}")
333
+
334
+ def emit_stage(
335
+ step: int,
336
+ label: str,
337
+ detail: str,
338
+ scope_text: str = "",
339
+ section_text: str = "",
340
+ paper_text: str = "",
341
+ slides_text: str = "",
342
+ pdf_paths_text: str = "",
343
+ paper_pdf_text: str = "",
344
+ slides_pdf_text: str = "",
345
+ ) -> None:
346
+ if stage_callback is None:
347
+ return
348
+ payload = {
349
+ "status": f"Running: {label}",
350
+ "progress_log": "\n".join(stage_logs),
351
+ "scope": scope_text,
352
+ "sections": section_text,
353
+ "paper": paper_text,
354
+ "slides": slides_text,
355
+ "pdf_paths": pdf_paths_text,
356
+ "paper_pdf": paper_pdf_text,
357
+ "slides_pdf": slides_pdf_text,
358
+ "progress": int(min(100, max(0, round(step / total_steps * 100)))),
359
+ "stage": label,
360
+ "detail": detail,
361
+ }
362
+ stage_callback(payload)
363
+
364
+ mark(1, "Preflight", "checking API connectivity")
365
+ emit_stage(1, "Preflight", "checking API connectivity")
366
+ if not use_mock:
367
+ preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
368
+ if preflight_error is not None:
369
+ return (
370
+ f"Agentic run failed: {preflight_error}",
371
+ "\n".join(stage_logs),
372
+ "",
373
+ "",
374
+ "",
375
+ "",
376
+ "",
377
+ "",
378
+ "",
379
+ )
380
+
381
+ scope_payload: dict[str, object]
382
+ section_plan: list[dict[str, str]]
383
+ section_blocks: list[dict[str, str]] = []
384
+ paper_tex = ""
385
+ slides_tex = ""
386
+
387
+ if use_mock:
388
+ mark(2, "Agent-1 ScopeScout", "using mock scope")
389
+ scope_payload = {
390
+ "project_links": [
391
+ {
392
+ "title": "RynnBrain repo",
393
+ "url": "https://github.com/alibaba-damo-academy/RynnBrain",
394
+ "reason": "Core project artifact",
395
+ },
396
+ {
397
+ "title": "arXiv references",
398
+ "url": "https://arxiv.org",
399
+ "reason": "Peer-reviewed baseline papers",
400
+ },
401
+ ],
402
+ "scope": {
403
+ "in_scope": ["architecture", "training/inference workflow", "evaluation evidence"],
404
+ "out_scope": ["business roadmap", "non-technical marketing claims"],
405
+ "key_questions": [
406
+ "What problem is solved?",
407
+ "What architecture choices matter?",
408
+ "What evidence supports claims?",
409
+ ],
410
+ },
411
+ }
412
+ emit_stage(
413
+ 2,
414
+ "Agent-1 ScopeScout",
415
+ "scope resolved",
416
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
417
+ )
418
+
419
+ mark(3, "Agent-StructureDesigner", "designing report structure")
420
+ structure_plan = {
421
+ "title": topic.strip(),
422
+ "sections": [
423
+ {"name": "Abstract", "goal": "State problem, method, key findings, and significance."},
424
+ {"name": "Introduction", "goal": "Context, motivation, and clear research question."},
425
+ {"name": "Methodology", "goal": "System design, assumptions, and evaluation protocol."},
426
+ {"name": "Results", "goal": "Evidence-backed findings with explicit source links."},
427
+ {"name": "Discussion", "goal": "Interpretation, limitations, and trade-offs."},
428
+ {"name": "Conclusion", "goal": "Takeaways and future work."},
429
+ ],
430
+ "slide_style": {
431
+ "max_bullets": 5,
432
+ "max_words_per_bullet": 14,
433
+ "visual_density": "low",
434
+ "must_include": ["agenda", "method diagram slide", "results table slide", "limitations"],
435
+ },
436
+ }
437
+ emit_stage(
438
+ 3,
439
+ "Agent-StructureDesigner",
440
+ "report structure designed",
441
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
442
+ section_text=json.dumps(structure_plan, ensure_ascii=False, indent=2),
443
+ )
444
+
445
+ mark(4, "Agent-2 TemplatePlanner", "building section summaries from templates")
446
+ section_plan = [
447
+ {"name": "Abstract", "summary": "Concise summary of problem, method, findings, and impact."},
448
+ {"name": "Introduction", "summary": "Problem framing and motivation in research context."},
449
+ {"name": "Methodology", "summary": "System architecture and methodological decisions."},
450
+ {"name": "Results", "summary": "Empirical findings and traceable evidence."},
451
+ {"name": "Discussion", "summary": "Interpretation of findings and practical implications."},
452
+ {"name": "Conclusion", "summary": "Actionable takeaways and next steps."},
453
+ ]
454
+ if lang == "zh":
455
+ section_plan = [
456
+ {"name": "摘要", "summary": "概述研究问题、方法、关键发现与价值。"},
457
+ {"name": "引言", "summary": "说明背景、动机与研究问题。"},
458
+ {"name": "方法", "summary": "阐述系统架构、方法流程与评估设置。"},
459
+ {"name": "结果", "summary": "给出可追溯证据支持的核心结论。"},
460
+ {"name": "讨论", "summary": "解释结果意义、局限与适用边界。"},
461
+ {"name": "结论", "summary": "总结与后续研究建议。"},
462
+ ]
463
+ emit_stage(
464
+ 4,
465
+ "Agent-2 TemplatePlanner",
466
+ "section plan prepared",
467
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
468
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
469
+ )
470
+
471
+ mark(5, "Section Agents", "drafting per-section TeX blocks")
472
+ for sec in section_plan:
473
+ section_blocks.append(
474
+ {
475
+ "name": sec["name"],
476
+ "latex": (
477
+ f"\\subsection*{{{sec['name']}}}\n"
478
+ f"{sec['summary']}\\\n"
479
+ "Evidence should map directly to claims and include method-specific details."
480
+ ),
481
+ }
482
+ )
483
+ emit_stage(
484
+ 5,
485
+ "Section Agents",
486
+ "section drafts ready",
487
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
488
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
489
+ paper_text="\n\n".join(block["latex"] for block in section_blocks),
490
+ )
491
+
492
+ mark(6, "Integrator-Paper", "merging section TeX into paper")
493
+ paper_tex = render_report_structured(topic.strip(), section_blocks, language=lang)
494
+
495
+ mark(7, "Integrator-Beamer", "building slide deck from report")
496
+ frames = build_slide_frames_from_sections(section_blocks, language=lang)
497
+ frames = enforce_slide_density(frames, language=lang)
498
+ slides_tex = render_beamer_frames(topic.strip(), frames, language=lang)
499
+ else:
500
+ timeout_s = max(12.0, min(float(request_budget), 40.0))
501
+ client_scope = GrokClient(
502
+ base_url=selected_base_url,
503
+ api_key=selected_api_key,
504
+ model=model_map["scope"],
505
+ timeout_s=timeout_s,
506
+ max_retries=2,
507
+ heartbeat=False,
508
+ )
509
+ client_structure = GrokClient(
510
+ base_url=selected_base_url,
511
+ api_key=selected_api_key,
512
+ model=model_map["structure"],
513
+ timeout_s=timeout_s,
514
+ max_retries=2,
515
+ heartbeat=False,
516
+ )
517
+ client_planner = GrokClient(
518
+ base_url=selected_base_url,
519
+ api_key=selected_api_key,
520
+ model=model_map["planner"],
521
+ timeout_s=timeout_s,
522
+ max_retries=2,
523
+ heartbeat=False,
524
+ )
525
+ client_section = GrokClient(
526
+ base_url=selected_base_url,
527
+ api_key=selected_api_key,
528
+ model=model_map["section"],
529
+ timeout_s=timeout_s,
530
+ max_retries=2,
531
+ heartbeat=False,
532
+ )
533
+ client_paper = GrokClient(
534
+ base_url=selected_base_url,
535
+ api_key=selected_api_key,
536
+ model=model_map["paper"],
537
+ timeout_s=timeout_s,
538
+ max_retries=2,
539
+ heartbeat=False,
540
+ )
541
+ client_slides = GrokClient(
542
+ base_url=selected_base_url,
543
+ api_key=selected_api_key,
544
+ model=model_map["slides"],
545
+ timeout_s=timeout_s,
546
+ max_retries=2,
547
+ heartbeat=False,
548
+ )
549
+
550
+ quick_scope = {
551
+ "project_links": [
552
+ {
553
+ "title": f"{topic.strip()} official repository",
554
+ "url": "https://github.com",
555
+ "reason": "Seed placeholder before remote scope enrichment.",
556
+ }
557
+ ],
558
+ "scope": {
559
+ "in_scope": ["architecture", "method", "evidence"],
560
+ "out_scope": ["marketing narrative", "non-technical roadmap"],
561
+ "key_questions": [
562
+ "What core problem is solved?",
563
+ "What design decisions matter most?",
564
+ "What evidence is verifiable?",
565
+ ],
566
+ },
567
+ }
568
+ emit_stage(
569
+ 2,
570
+ "Agent-1 ScopeScout",
571
+ "quick skeleton ready; enriching with remote call",
572
+ scope_text=json.dumps(quick_scope, ensure_ascii=False, indent=2),
573
+ )
574
+
575
+ mark(2, "Agent-1 ScopeScout", "asking Grok for project links + scope")
576
+ try:
577
+ scope_payload = _chat_json_resilient(
578
+ client_scope,
579
+ [
580
+ ChatMessage(
581
+ role="system",
582
+ content=(
583
+ "You are ScopeScout. Find key project links and define an initial technical research scope."
584
+ ),
585
+ ),
586
+ ChatMessage(
587
+ role="user",
588
+ content=(
589
+ "Topic: "
590
+ + topic.strip()
591
+ + "\nReturn JSON with keys: project_links (list of {title,url,reason}),"
592
+ + " scope ({in_scope:[...], out_scope:[...], key_questions:[...]})"
593
+ ),
594
+ ),
595
+ ],
596
+ schema_hint=(
597
+ '{"project_links":[{"title":"...","url":"https://...","reason":"..."}],'
598
+ '"scope":{"in_scope":["..."],"out_scope":["..."],"key_questions":["..."]}}'
599
+ ),
600
+ temperature=0.1,
601
+ timeout_s=min(timeout_s, 18.0),
602
+ )
603
+ except Exception:
604
+ scope_payload = quick_scope
605
+ emit_stage(
606
+ 2,
607
+ "Agent-1 ScopeScout",
608
+ "scope resolved",
609
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
610
+ )
611
+
612
+ mark(3, "Agent-StructureDesigner", "designing report architecture and slide style")
613
+ structure_obj = _chat_json_resilient(
614
+ client_structure,
615
+ [
616
+ ChatMessage(
617
+ role="system",
618
+ content=(
619
+ "You are StructureDesigner. Build a publication-grade report architecture and a presentation"
620
+ " style guide before drafting any sections."
621
+ + (" Respond in Chinese." if lang == "zh" else " Respond in English.")
622
+ ),
623
+ ),
624
+ ChatMessage(
625
+ role="user",
626
+ content=(
627
+ "Topic: "
628
+ + topic.strip()
629
+ + "\nScope JSON: "
630
+ + json.dumps(scope_payload, ensure_ascii=False)
631
+ + "\nReturn JSON {report_blueprint:{section_order:[...],section_goals:[...]},"
632
+ + " slide_style:{theme,max_bullets,max_words_per_bullet,visual_rules:[...]}}"
633
+ + " Ensure this is a RESEARCH REPORT structure (not academic paper IMRaD rigidity)."
634
+ ),
635
+ ),
636
+ ],
637
+ schema_hint='{"report_blueprint":{"section_order":["..."],"section_goals":["..."]},"slide_style":{"theme":"..."}}',
638
+ temperature=0.15,
639
+ timeout_s=timeout_s,
640
+ )
641
+ if not isinstance(structure_obj, dict) or not structure_obj:
642
+ structure_obj = {
643
+ "report_blueprint": {
644
+ "section_order": [
645
+ "Abstract",
646
+ "Introduction",
647
+ "Methodology",
648
+ "Results",
649
+ "Discussion",
650
+ "Conclusion",
651
+ ],
652
+ "section_goals": [
653
+ "Summarize research contribution",
654
+ "Define context and question",
655
+ "Describe method rigorously",
656
+ "Present evidence with citations",
657
+ "Discuss limits and implications",
658
+ "Conclude and future work",
659
+ ],
660
+ },
661
+ "slide_style": {
662
+ "theme": "metropolis-like clean",
663
+ "max_bullets": 5,
664
+ "max_words_per_bullet": 14,
665
+ "visual_rules": [
666
+ "one idea per slide",
667
+ "results in table/figure frame",
668
+ "consistent color accents",
669
+ ],
670
+ },
671
+ }
672
+ emit_stage(
673
+ 3,
674
+ "Agent-StructureDesigner",
675
+ "structure blueprint ready",
676
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
677
+ section_text=json.dumps(structure_obj, ensure_ascii=False, indent=2),
678
+ )
679
+
680
+ mark(4, "Agent-2 TemplatePlanner", "mapping scope to paper/beamer section summaries")
681
+ section_obj = _chat_json_resilient(
682
+ client_planner,
683
+ [
684
+ ChatMessage(
685
+ role="system",
686
+ content=(
687
+ "You are TemplatePlanner. Based on scope and LaTeX paper/beamer structure, define section"
688
+ " summaries that downstream section agents will write."
689
+ + (" Respond in Chinese." if lang == "zh" else " Respond in English.")
690
+ ),
691
+ ),
692
+ ChatMessage(
693
+ role="user",
694
+ content=(
695
+ "Topic: "
696
+ + topic.strip()
697
+ + "\nScope JSON: "
698
+ + json.dumps(scope_payload, ensure_ascii=False)
699
+ + "\nStructure JSON: "
700
+ + json.dumps(structure_obj, ensure_ascii=False)
701
+ + "\nReturn JSON: {sections:[{name,summary}]} with 6-8 sections for a RESEARCH REPORT."
702
+ + " Ensure section names are concise and audience-friendly."
703
+ ),
704
+ ),
705
+ ],
706
+ schema_hint='{"sections":[{"name":"Introduction","summary":"..."}]}',
707
+ temperature=0.1,
708
+ timeout_s=timeout_s,
709
+ )
710
+ raw_sections = section_obj.get("sections")
711
+ section_plan = [
712
+ {"name": str(x.get("name", "Section")), "summary": str(x.get("summary", ""))}
713
+ for x in raw_sections
714
+ if isinstance(x, dict)
715
+ ] if isinstance(raw_sections, list) else []
716
+ section_plan = section_plan[:6]
717
+ if not section_plan:
718
+ section_plan = [
719
+ {"name": "Abstract", "summary": "Concise summary of contribution and findings."},
720
+ {"name": "Introduction", "summary": "Problem framing and objectives."},
721
+ {"name": "Methodology", "summary": "Core architecture and methodology."},
722
+ {"name": "Results", "summary": "Findings grounded in verifiable sources."},
723
+ ]
724
+ emit_stage(
725
+ 4,
726
+ "Agent-2 TemplatePlanner",
727
+ "section plan prepared",
728
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
729
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
730
+ )
731
+
732
+ mark(5, "Section Agents", "researching each section and drafting TeX fragments")
733
+ for idx, sec in enumerate(section_plan, start=1):
734
+ section_title = sec["name"]
735
+ latex_body = ""
736
+ for attempt in range(1, 4):
737
+ sec_obj = _chat_json_resilient(
738
+ client_section,
739
+ [
740
+ ChatMessage(
741
+ role="system",
742
+ content=(
743
+ "You are a SectionResearchAgent. Write a rigorous LaTeX fragment for your assigned"
744
+ " section only."
745
+ + (" Output Chinese text." if lang == "zh" else " Output English text.")
746
+ ),
747
+ ),
748
+ ChatMessage(
749
+ role="user",
750
+ content=(
751
+ f"Topic: {topic.strip()}\nSection: {sec['name']}\nSummary: {sec['summary']}\n"
752
+ f"Structure JSON: {json.dumps(structure_obj, ensure_ascii=False)}\n"
753
+ "Return JSON {section_title, latex_body}. latex_body must be plain LaTeX paragraphs"
754
+ " without documentclass/begin{document}, with evidence-driven style and citation markers."
755
+ " Keep each paragraph focused and concise for report readability."
756
+ " Minimum: 2 substantive paragraphs. No placeholder text."
757
+ ),
758
+ ),
759
+ ],
760
+ schema_hint='{"section_title":"...","latex_body":"\\subsection*{...} ..."}',
761
+ temperature=0.1,
762
+ timeout_s=timeout_s,
763
+ )
764
+ cand_title = sec_obj.get("section_title")
765
+ cand_body = sec_obj.get("latex_body")
766
+ if isinstance(cand_title, str) and cand_title.strip():
767
+ section_title = cand_title.strip()
768
+ if isinstance(cand_body, str):
769
+ latex_body = cand_body.strip()
770
+ if _section_quality_ok(section_title, latex_body, lang):
771
+ break
772
+ emit_stage(
773
+ 5,
774
+ "Section Agents",
775
+ f"quality gate retry {attempt}/3 for section {idx}",
776
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
777
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
778
+ paper_text="\n\n".join(block["latex"] for block in section_blocks),
779
+ )
780
+ if not _section_quality_ok(section_title, latex_body, lang):
781
+ raise RuntimeError(
782
+ f"Section agent failed quality gate after retries: {section_title}"
783
+ )
784
+ section_blocks.append({"name": section_title, "latex": latex_body})
785
+ mark(4, "Section Agents", f"completed {idx}/{len(section_plan)} sections")
786
+ emit_stage(
787
+ 5,
788
+ "Section Agents",
789
+ f"completed {idx}/{len(section_plan)} sections",
790
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
791
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
792
+ paper_text="\n\n".join(block["latex"] for block in section_blocks),
793
+ )
794
+
795
+ mark(6, "Integrator-Paper", "assembling full paper.tex")
796
+ paper_obj = _chat_json_resilient(
797
+ client_paper,
798
+ [
799
+ ChatMessage(
800
+ role="system",
801
+ content=(
802
+ "You are ReportIntegrator. Produce a professional LaTeX RESEARCH REPORT"
803
+ " with executive readability, clear argument flow, and section coherence."
804
+ + (" Output Chinese text." if lang == "zh" else " Output English text.")
805
+ ),
806
+ ),
807
+ ChatMessage(
808
+ role="user",
809
+ content=(
810
+ "Topic: "
811
+ + topic.strip()
812
+ + "\nScope: "
813
+ + json.dumps(scope_payload, ensure_ascii=False)
814
+ + "\nStructure: "
815
+ + json.dumps(structure_obj, ensure_ascii=False)
816
+ + "\nSection snippets: "
817
+ + json.dumps(section_blocks, ensure_ascii=False)
818
+ + "\nReturn JSON {paper_tex} with a full compilable document using report sections:"
819
+ + " Executive Summary/Abstract, Background, Approach, Results, Discussion, Risks, Conclusion, References."
820
+ + " Each section should include concrete evidence statements and implementation-level details,"
821
+ + " not high-level filler. Minimum 2-4 substantive paragraphs per major section."
822
+ ),
823
+ ),
824
+ ],
825
+ schema_hint='{"paper_tex":"\\documentclass{article} ... \\end{document}"}',
826
+ temperature=0.1,
827
+ timeout_s=timeout_s,
828
+ )
829
+ _paper_candidate = paper_obj.get("paper_tex")
830
+ paper_tex = render_report_structured(topic.strip(), section_blocks, language=lang)
831
+ _assert_not_template_output("paper", paper_tex)
832
+ emit_stage(
833
+ 6,
834
+ "Integrator-Paper",
835
+ "paper.tex assembled",
836
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
837
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
838
+ paper_text=paper_tex,
839
+ )
840
+
841
+ mark(7, "Integrator-Beamer", "assembling full slides.tex")
842
+ slides_obj = _chat_json_resilient(
843
+ client_slides,
844
+ [
845
+ ChatMessage(
846
+ role="system",
847
+ content=(
848
+ "You are BeamerIntegrator. Produce a visually polished, conference-style Beamer deck"
849
+ " with concise bullets, visual hierarchy, and readable spacing."
850
+ + (" Output Chinese text." if lang == "zh" else " Output English text.")
851
+ ),
852
+ ),
853
+ ChatMessage(
854
+ role="user",
855
+ content=(
856
+ "Topic: "
857
+ + topic.strip()
858
+ + "\nScope: "
859
+ + json.dumps(scope_payload, ensure_ascii=False)
860
+ + "\nSection plan: "
861
+ + json.dumps(section_plan, ensure_ascii=False)
862
+ + "\nSlide style: "
863
+ + json.dumps(structure_obj.get("slide_style", {}), ensure_ascii=False)
864
+ + "\nReturn JSON {slides_tex} with a full compilable beamer document."
865
+ + " Use modern readable typography, max 5 bullets/frame, max 14 words/bullet,"
866
+ + " and ensure each frame content fully fits without overflow."
867
+ + " Include complete coverage: agenda, background, method, results, discussion, conclusion."
868
+ + " Return STRICTLY compilable LaTeX without custom undefined macros."
869
+ ),
870
+ ),
871
+ ],
872
+ schema_hint='{"slides_tex":"\\documentclass{beamer} ... \\end{document}"}',
873
+ temperature=0.1,
874
+ timeout_s=timeout_s,
875
+ )
876
+ _slides_candidate = slides_obj.get("slides_tex")
877
+ frames = build_slide_frames_from_sections(section_blocks, language=lang)
878
+ frames = enforce_slide_density(frames, language=lang)
879
+ slides_tex = render_beamer_frames(topic.strip(), frames, language=lang)
880
+ _assert_not_template_output("slides", slides_tex)
881
+ emit_stage(
882
+ 7,
883
+ "Integrator-Beamer",
884
+ "slides.tex assembled",
885
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
886
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
887
+ paper_text=paper_tex,
888
+ slides_text=slides_tex,
889
+ )
890
+
891
+ mark(8, "Online Render", "compiling paper/slides to PDF via latexonline.cc")
892
+ emit_stage(
893
+ 8,
894
+ "Online Render",
895
+ "rendering started",
896
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
897
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
898
+ paper_text=paper_tex,
899
+ slides_text=slides_tex,
900
+ )
901
+ try:
902
+ paper_pdf = _compile_latex_online(paper_tex, "hydradeck_agentic_paper.pdf")
903
+ slides_pdf = _compile_latex_online(slides_tex, "hydradeck_agentic_slides.pdf")
904
+ emit_stage(
905
+ 8,
906
+ "Online Render",
907
+ "pdf rendered",
908
+ scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
909
+ section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
910
+ paper_text=paper_tex,
911
+ slides_text=slides_tex,
912
+ pdf_paths_text=paper_pdf + "\n" + slides_pdf,
913
+ paper_pdf_text=paper_pdf,
914
+ slides_pdf_text=slides_pdf,
915
+ )
916
+ except Exception as exc:
917
+ return (
918
+ f"Agentic run partial success: TeX generated but online PDF render failed: {exc}",
919
+ "\n".join(stage_logs),
920
+ json.dumps(scope_payload, ensure_ascii=False, indent=2),
921
+ json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
922
+ paper_tex,
923
+ slides_tex,
924
+ "",
925
+ "",
926
+ "",
927
+ )
928
+
929
+ mark(9, "Done", "paper/slides PDFs rendered and ready")
930
+ return (
931
+ "Agentic pipeline done: scoped, drafted, integrated, rendered to PDF.",
932
+ "\n".join(stage_logs),
933
+ json.dumps(scope_payload, ensure_ascii=False, indent=2),
934
+ json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
935
+ paper_tex,
936
+ slides_tex,
937
+ paper_pdf + "\n" + slides_pdf,
938
+ paper_pdf,
939
+ slides_pdf,
940
+ )
941
+
942
+
943
+ def _run_agentic_pipeline_stream(
944
+ topic: str,
945
+ model: str,
946
+ base_url: str,
947
+ api_key: str,
948
+ request_budget: float,
949
+ use_mock: bool,
950
+ ):
951
+ status = "Agentic pipeline running..."
952
+ progress_log = "1/3 Starting workflow"
953
+ empty_json = ""
954
+ empty_tex = ""
955
+ empty_paths = ""
956
+ yield (
957
+ status,
958
+ progress_log,
959
+ empty_json,
960
+ empty_json,
961
+ empty_tex,
962
+ empty_tex,
963
+ empty_paths,
964
+ "",
965
+ "",
966
+ 5,
967
+ )
968
+
969
+ progress_log = "1/3 API scope and section planning"
970
+ yield (
971
+ status,
972
+ progress_log,
973
+ empty_json,
974
+ empty_json,
975
+ empty_tex,
976
+ empty_tex,
977
+ empty_paths,
978
+ "",
979
+ "",
980
+ 30,
981
+ )
982
+
983
+ events: Queue[dict[str, object]] = Queue()
984
+
985
+ def on_stage(payload: dict[str, object]) -> None:
986
+ events.put(payload)
987
+
988
+ with ThreadPoolExecutor(max_workers=1) as pool:
989
+ fut = pool.submit(
990
+ _run_agentic_pipeline,
991
+ topic,
992
+ model,
993
+ base_url,
994
+ api_key,
995
+ request_budget,
996
+ use_mock,
997
+ gr.Progress(),
998
+ on_stage,
999
+ )
1000
+ wait_tick = 0
1001
+ while not fut.done() or not events.empty():
1002
+ try:
1003
+ ev = events.get(timeout=1.0)
1004
+ yield (
1005
+ str(ev.get("status", "Agentic pipeline running...")),
1006
+ str(ev.get("progress_log", "")),
1007
+ str(ev.get("scope", "")),
1008
+ str(ev.get("sections", "")),
1009
+ str(ev.get("paper", "")),
1010
+ str(ev.get("slides", "")),
1011
+ str(ev.get("pdf_paths", "")),
1012
+ str(ev.get("paper_pdf", "")),
1013
+ str(ev.get("slides_pdf", "")),
1014
+ int(str(ev.get("progress", "0"))),
1015
+ )
1016
+ continue
1017
+ except Empty:
1018
+ pass
1019
+
1020
+ wait_tick += 1
1021
+ elapsed_s = wait_tick
1022
+ heartbeat_pct = min(95, 30 + wait_tick)
1023
+ yield (
1024
+ "Agentic pipeline running...",
1025
+ f"2/3 Running agent workflow ({elapsed_s}s elapsed)",
1026
+ empty_json,
1027
+ empty_json,
1028
+ empty_tex,
1029
+ empty_tex,
1030
+ empty_paths,
1031
+ "",
1032
+ "",
1033
+ heartbeat_pct,
1034
+ )
1035
+ time.sleep(1)
1036
+
1037
+ (
1038
+ status2,
1039
+ progress2,
1040
+ scope2,
1041
+ sections2,
1042
+ paper2,
1043
+ slides2,
1044
+ paths2,
1045
+ paper_pdf2,
1046
+ slides_pdf2,
1047
+ ) = fut.result()
1048
+
1049
+ done_log = "3/3 Completed"
1050
+ if progress2.strip():
1051
+ done_log = progress2 + "\n" + done_log
1052
+
1053
+ yield (
1054
+ status2,
1055
+ done_log,
1056
+ scope2,
1057
+ sections2,
1058
+ paper2,
1059
+ slides2,
1060
+ paths2,
1061
+ paper_pdf2,
1062
+ slides_pdf2,
1063
+ 100,
1064
+ )
1065
+
1066
+
1067
+ def _run_pipeline(
1068
+ topic: str,
1069
+ model: str,
1070
+ base_url: str,
1071
+ api_key: str,
1072
+ max_sources: int,
1073
+ iterations: int,
1074
+ llm_timeout: float,
1075
+ request_budget: float,
1076
+ seed_urls_text: str,
1077
+ use_mock: bool,
1078
+ ) -> tuple[str, str, str, str]:
1079
+ if not topic.strip():
1080
+ return "Topic is required.", "", "", ""
1081
+
1082
+ selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
1083
+ selected_api_key = api_key.strip() or resolve_api_key()
1084
+
1085
+ if not use_mock:
1086
+ preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
1087
+ if preflight_error is not None:
1088
+ return f"Preflight failed: {preflight_error}", "", "", ""
1089
+
1090
+ with tempfile.TemporaryDirectory() as td:
1091
+ out_zip = Path(td) / "hydradeck_out.zip"
1092
+ seeds = [x.strip() for x in seed_urls_text.splitlines() if x.strip()]
1093
+ cfg = RunConfig(
1094
+ topic=topic.strip(),
1095
+ out=out_zip,
1096
+ base_url=selected_base_url,
1097
+ api_key=selected_api_key,
1098
+ model=model.strip() or resolve_model("grok-4"),
1099
+ iterations=max(1, int(iterations)),
1100
+ max_sources=max(1, int(max_sources)),
1101
+ llm_timeout_s=float(llm_timeout),
1102
+ request_budget_s=float(request_budget),
1103
+ use_mock=bool(use_mock),
1104
+ seed_urls=seeds or None,
1105
+ progress=False,
1106
+ quality_gate=False,
1107
+ archive_snapshots=False,
1108
+ )
1109
+
1110
+ retry_cfg = RunConfig(
1111
+ topic=cfg.topic,
1112
+ out=cfg.out,
1113
+ base_url=cfg.base_url,
1114
+ api_key=cfg.api_key,
1115
+ model=cfg.model,
1116
+ iterations=cfg.iterations,
1117
+ max_sources=cfg.max_sources,
1118
+ module_sources=cfg.module_sources,
1119
+ min_total_words=cfg.min_total_words,
1120
+ use_mock=cfg.use_mock,
1121
+ verbose=cfg.verbose,
1122
+ llm_timeout_s=max(cfg.llm_timeout_s, 90.0),
1123
+ facts_max_pages=cfg.facts_max_pages,
1124
+ facts_max_chars_per_page=cfg.facts_max_chars_per_page,
1125
+ facts_target=cfg.facts_target,
1126
+ judge_max_chars=cfg.judge_max_chars,
1127
+ pre_tex_quality_gate=cfg.pre_tex_quality_gate,
1128
+ pre_tex_min_score=cfg.pre_tex_min_score,
1129
+ pre_tex_attempts=cfg.pre_tex_attempts,
1130
+ keep_stage=cfg.keep_stage,
1131
+ verbatim=cfg.verbatim,
1132
+ archive_prompts=cfg.archive_prompts,
1133
+ archive_snapshots=cfg.archive_snapshots,
1134
+ snapshot_timeout_s=cfg.snapshot_timeout_s,
1135
+ snapshot_total_timeout_s=cfg.snapshot_total_timeout_s,
1136
+ auto=cfg.auto,
1137
+ auto_queries=cfg.auto_queries,
1138
+ auto_models=cfg.auto_models,
1139
+ quality_gate=cfg.quality_gate,
1140
+ min_quality_score=cfg.min_quality_score,
1141
+ max_quality_attempts=cfg.max_quality_attempts,
1142
+ query_count=cfg.query_count,
1143
+ max_query_modules=cfg.max_query_modules,
1144
+ sources_attempts=cfg.sources_attempts,
1145
+ max_total_runtime_s=max(cfg.max_total_runtime_s, 420.0),
1146
+ progress=cfg.progress,
1147
+ request_budget_s=max(cfg.request_budget_s, 35.0),
1148
+ pdf_compiler=cfg.pdf_compiler,
1149
+ template=cfg.template,
1150
+ seed_urls=cfg.seed_urls,
1151
+ )
1152
+ try:
1153
+ _ = run(cfg)
1154
+ except Exception as exc:
1155
+ err_text = str(exc)
1156
+ retryable = ("Read timed out" in err_text) or ("timed out" in err_text.lower())
1157
+ if (not use_mock) and retryable:
1158
+ try:
1159
+ _ = run(retry_cfg)
1160
+ except Exception as retry_exc:
1161
+ return (
1162
+ "Run failed after retry: "
1163
+ f"{retry_exc}. Try request_budget >= 35 and llm_timeout >= 90.",
1164
+ "",
1165
+ "",
1166
+ "",
1167
+ )
1168
+ else:
1169
+ return (
1170
+ "Run failed: "
1171
+ f"{exc}. If queue waits too long, try Use mock (offline) or increase Request budget.",
1172
+ "",
1173
+ "",
1174
+ "",
1175
+ )
1176
+
1177
+ with zipfile.ZipFile(out_zip, "r") as z:
1178
+ report_md = z.read("report.md").decode("utf-8", errors="replace")
1179
+ paper_tex = z.read("paper.tex").decode("utf-8", errors="replace")
1180
+ slides_tex = z.read("slides.tex").decode("utf-8", errors="replace")
1181
+
1182
+ copy_zip = Path("/tmp") / "hydradeck_space_output.zip"
1183
+ copy_zip.write_bytes(out_zip.read_bytes())
1184
+ status = f"Done. Output zip: {copy_zip}"
1185
+ return status, report_md, paper_tex, slides_tex
1186
+
1187
+
1188
+ with gr.Blocks(title="hydradeck WebUI") as demo:
1189
+ gr.Markdown("# hydradeck WebUI\nRun deep-research and export paper/slides tex.")
1190
+ with gr.Row():
1191
+ topic = gr.Textbox(label="Topic", value="RynnBrain technical report")
1192
+ model = gr.Textbox(label="Model", value="grok-4")
1193
+ with gr.Row():
1194
+ base_url = gr.Textbox(label="Base URL", value="https://api.example.com")
1195
+ api_key = gr.Textbox(label="API Key", type="password", value="")
1196
+ with gr.Row():
1197
+ max_sources = gr.Number(label="Max sources", value=6, precision=0)
1198
+ iterations = gr.Number(label="Iterations", value=1, precision=0)
1199
+ llm_timeout = gr.Number(label="LLM timeout (s)", value=90)
1200
+ request_budget = gr.Number(label="Request budget (s)", value=35)
1201
+ seed_urls = gr.Textbox(
1202
+ label="Seed URLs (one per line)",
1203
+ value="https://github.com/alibaba-damo-academy/RynnBrain\nhttps://arxiv.org",
1204
+ lines=4,
1205
+ )
1206
+ use_mock = gr.Checkbox(label="Use mock (offline)", value=False)
1207
+
1208
+ check_btn = gr.Button("Quick API Check")
1209
+ run_btn = gr.Button("Run Full Pipeline")
1210
+ run_agentic_btn = gr.Button("Run Agentic Pipeline")
1211
+ status = gr.Textbox(label="Status")
1212
+ progress_pct = gr.Slider(label="Progress (%)", minimum=0, maximum=100, step=1, value=0, interactive=False)
1213
+ progress_log = gr.Textbox(label="Agent Progress", lines=10)
1214
+ scope_json = gr.Textbox(label="Scope (Agent-1)", lines=10)
1215
+ section_plan_json = gr.Textbox(label="Section Plan (Agent-2)", lines=10)
1216
+ report_md = gr.Textbox(label="report.md", lines=14)
1217
+ paper_tex = gr.Textbox(label="paper.tex", lines=14)
1218
+ slides_tex = gr.Textbox(label="slides.tex", lines=14)
1219
+ rendered_pdfs = gr.Textbox(label="Rendered PDF Paths", lines=2)
1220
+ paper_pdf_file = gr.Textbox(label="paper.pdf path", lines=1)
1221
+ slides_pdf_file = gr.Textbox(label="slides.pdf path", lines=1)
1222
+
1223
+ check_btn.click(
1224
+ _api_quick_check,
1225
+ [base_url, api_key, model, request_budget],
1226
+ [status],
1227
+ queue=False,
1228
+ )
1229
+
1230
+ run_btn.click(
1231
+ _run_pipeline,
1232
+ [
1233
+ topic,
1234
+ model,
1235
+ base_url,
1236
+ api_key,
1237
+ max_sources,
1238
+ iterations,
1239
+ llm_timeout,
1240
+ request_budget,
1241
+ seed_urls,
1242
+ use_mock,
1243
+ ],
1244
+ [status, report_md, paper_tex, slides_tex],
1245
+ queue=False,
1246
+ )
1247
+
1248
+ run_agentic_btn.click(
1249
+ _run_agentic_pipeline_stream,
1250
+ [topic, model, base_url, api_key, request_budget, use_mock],
1251
+ [
1252
+ status,
1253
+ progress_log,
1254
+ scope_json,
1255
+ section_plan_json,
1256
+ paper_tex,
1257
+ slides_tex,
1258
+ rendered_pdfs,
1259
+ paper_pdf_file,
1260
+ slides_pdf_file,
1261
+ progress_pct,
1262
+ ],
1263
+ queue=True,
1264
+ )
1265
+
1266
+
1267
+ if __name__ == "__main__":
1268
+ demo.queue(default_concurrency_limit=2)
1269
+ demo.launch(server_name="0.0.0.0", server_port=7860)
custom_web.py ADDED
@@ -0,0 +1,547 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import threading
5
+ import time
6
+ import uuid
7
+ from pathlib import Path
8
+ from typing import Any
9
+
10
+ from fastapi import FastAPI, HTTPException
11
+ from fastapi.responses import FileResponse, HTMLResponse
12
+ import gradio as gr
13
+ from pydantic import BaseModel
14
+
15
+ from app import _api_quick_check, _run_agentic_pipeline
16
+ from hydradeck.clients.grok_client import GrokClient
17
+
18
+
19
+ class RunRequest(BaseModel):
20
+ topic: str
21
+ model: str = "grok-3-mini"
22
+ base_url: str = "https://api.example.com"
23
+ api_key: str = ""
24
+ request_budget: float = 30.0
25
+ use_mock: bool = False
26
+ language: str = "en"
27
+ model_scope: str = ""
28
+ model_structure: str = ""
29
+ model_planner: str = ""
30
+ model_section: str = ""
31
+ model_paper: str = ""
32
+ model_slides: str = ""
33
+
34
+
35
+ JOBS: dict[str, dict[str, Any]] = {}
36
+ LOCK = threading.Lock()
37
+ STATE_PATH = Path("/tmp/hydradeck_state.json")
38
+ HISTORY_LIMIT = 40
39
+
40
+ app = FastAPI(title="HydraDeck")
41
+
42
+
43
+ def _load_state() -> None:
44
+ if not STATE_PATH.exists():
45
+ return
46
+ try:
47
+ data = json.loads(STATE_PATH.read_text(encoding="utf-8"))
48
+ except Exception:
49
+ return
50
+ jobs = data.get("jobs")
51
+ if isinstance(jobs, dict):
52
+ with LOCK:
53
+ JOBS.update({str(k): v for k, v in jobs.items() if isinstance(v, dict)})
54
+
55
+
56
+ def _save_state() -> None:
57
+ with LOCK:
58
+ payload = {"jobs": JOBS}
59
+ STATE_PATH.write_text(json.dumps(payload, ensure_ascii=False), encoding="utf-8")
60
+
61
+
62
+ def _prune_history() -> None:
63
+ with LOCK:
64
+ items = sorted(
65
+ JOBS.items(),
66
+ key=lambda kv: float(kv[1].get("updated_at", 0.0)),
67
+ reverse=True,
68
+ )
69
+ keep = dict(items[:HISTORY_LIMIT])
70
+ JOBS.clear()
71
+ JOBS.update(keep)
72
+
73
+
74
+ _load_state()
75
+
76
+
77
+ def _new_job(req: RunRequest) -> dict[str, Any]:
78
+ now = time.time()
79
+ return {
80
+ "id": str(uuid.uuid4()),
81
+ "status": "queued",
82
+ "created_at": now,
83
+ "updated_at": now,
84
+ "progress": 0,
85
+ "status_text": "Queued",
86
+ "progress_log": "",
87
+ "scope": "",
88
+ "sections": "",
89
+ "paper": "",
90
+ "slides": "",
91
+ "pdf_paths": "",
92
+ "paper_pdf": "",
93
+ "slides_pdf": "",
94
+ "error": "",
95
+ "events": [],
96
+ "params": req.model_dump(),
97
+ }
98
+
99
+
100
+ def _update_job(job_id: str, updates: dict[str, Any]) -> None:
101
+ with LOCK:
102
+ job = JOBS.get(job_id)
103
+ if not job:
104
+ return
105
+ job.update(updates)
106
+ job["updated_at"] = time.time()
107
+ _prune_history()
108
+ _save_state()
109
+
110
+
111
+ def _append_event(job_id: str, event: dict[str, Any]) -> None:
112
+ with LOCK:
113
+ job = JOBS.get(job_id)
114
+ if not job:
115
+ return
116
+ events = job.get("events")
117
+ if isinstance(events, list):
118
+ events.append(event)
119
+ _save_state()
120
+
121
+
122
+ def _run_job(job_id: str, req: RunRequest) -> None:
123
+ _update_job(job_id, {"status": "running", "status_text": "Running"})
124
+
125
+ def on_stage(payload: dict[str, Any]) -> None:
126
+ _update_job(
127
+ job_id,
128
+ {
129
+ "status": "running",
130
+ "status_text": str(payload.get("status", "Running")),
131
+ "progress": int(str(payload.get("progress", "0"))),
132
+ "progress_log": str(payload.get("progress_log", "")),
133
+ "scope": str(payload.get("scope", "")),
134
+ "sections": str(payload.get("sections", "")),
135
+ "paper": str(payload.get("paper", "")),
136
+ "slides": str(payload.get("slides", "")),
137
+ "pdf_paths": str(payload.get("pdf_paths", "")),
138
+ "paper_pdf": str(payload.get("paper_pdf", "")),
139
+ "slides_pdf": str(payload.get("slides_pdf", "")),
140
+ },
141
+ )
142
+ _append_event(
143
+ job_id,
144
+ {
145
+ "ts": time.time(),
146
+ "stage": str(payload.get("stage", "")),
147
+ "detail": str(payload.get("detail", "")),
148
+ "progress": int(str(payload.get("progress", "0"))),
149
+ },
150
+ )
151
+
152
+ try:
153
+ (
154
+ status,
155
+ progress_log,
156
+ scope,
157
+ sections,
158
+ paper,
159
+ slides,
160
+ pdf_paths,
161
+ paper_pdf,
162
+ slides_pdf,
163
+ ) = _run_agentic_pipeline(
164
+ topic=req.topic,
165
+ model=req.model,
166
+ base_url=req.base_url,
167
+ api_key=req.api_key,
168
+ request_budget=req.request_budget,
169
+ use_mock=req.use_mock,
170
+ progress=gr.Progress(),
171
+ stage_callback=on_stage,
172
+ language=req.language,
173
+ stage_models={
174
+ "scope": req.model_scope,
175
+ "structure": req.model_structure,
176
+ "planner": req.model_planner,
177
+ "section": req.model_section,
178
+ "paper": req.model_paper,
179
+ "slides": req.model_slides,
180
+ },
181
+ )
182
+ _update_job(
183
+ job_id,
184
+ {
185
+ "status": "done",
186
+ "status_text": status,
187
+ "progress": 100,
188
+ "progress_log": progress_log,
189
+ "scope": scope,
190
+ "sections": sections,
191
+ "paper": paper,
192
+ "slides": slides,
193
+ "pdf_paths": pdf_paths,
194
+ "paper_pdf": paper_pdf,
195
+ "slides_pdf": slides_pdf,
196
+ },
197
+ )
198
+ except Exception as exc:
199
+ _update_job(
200
+ job_id,
201
+ {
202
+ "status": "error",
203
+ "status_text": "Failed",
204
+ "error": str(exc),
205
+ },
206
+ )
207
+
208
+
209
+ @app.get("/", response_class=HTMLResponse)
210
+ def index() -> str:
211
+ return """
212
+ <!doctype html>
213
+ <html>
214
+ <head>
215
+ <meta charset=\"utf-8\" />
216
+ <title>HydraDeck</title>
217
+ <style>
218
+ :root{--bg:#f5ecd8;--paper:#fff9ec;--ink:#2a1f12;--muted:#7a5f3e;--accent:#8b3a3a;--ok:#2f6f3e}
219
+ body{font-family:"IBM Plex Mono","Courier New",monospace;max-width:1220px;margin:18px auto;padding:0 12px;background:var(--bg);color:var(--ink)}
220
+ .panel{border:2px solid var(--ink);background:var(--paper);box-shadow:2px 2px 0 #0002;padding:10px;margin:10px 0}
221
+ .row{display:flex;gap:10px;margin:8px 0;flex-wrap:wrap}
222
+ input,select,textarea{padding:8px;width:100%;border:1px solid #4b3924;background:#fffdf7;color:var(--ink)}
223
+ button{padding:9px 13px;border:2px solid var(--ink);background:#ead2b0;color:var(--ink);cursor:pointer}
224
+ button:hover{background:#f0ddc3}
225
+ .bar{height:16px;background:#d8c3a5;border:1px solid #4b3924;overflow:hidden}
226
+ .fill{height:100%;width:0%;background:linear-gradient(90deg,#8b3a3a,#d46a6a);transition:width .25s}
227
+ .grid{display:grid;grid-template-columns:1fr 1fr;gap:12px}
228
+ pre{background:#1b130c;color:#f7e8d0;padding:10px;white-space:pre-wrap;max-height:260px;overflow:auto;border:1px solid #3a2a1b}
229
+ .title{font-size:28px;font-weight:700;letter-spacing:1px}
230
+ .sub{color:var(--muted)}
231
+ .tiny{font-size:12px;color:var(--muted)}
232
+ details{border:1px dashed #7a5f3e;padding:8px;background:#fff9ef}
233
+ summary{cursor:pointer;font-weight:700}
234
+ </style>
235
+ </head>
236
+ <body>
237
+ <div class=\"panel\"><div class=\"title\">HydraDeck</div></div>
238
+ <div class=\"panel\">
239
+ <div class=\"row\" style=\"gap:6px\">
240
+ <button onclick=\"showTab('tab-run')\">Run</button>
241
+ <button onclick=\"showTab('tab-artifacts')\">Artifacts</button>
242
+ <button onclick=\"showTab('tab-console')\">Console</button>
243
+ </div>
244
+ </div>
245
+
246
+ <div id=\"tab-run\" class=\"panel tab\">
247
+ <div class=\"row\"><input id=\"topic\" value=\"RynnBrain technical research report\" /></div>
248
+ <div class=\"row\">
249
+ <select id=\"model\"></select>
250
+ <input id=\"base_url\" value=\"https://api.example.com\" />
251
+ </div>
252
+ <div class=\"row\">
253
+ <label>language
254
+ <select id=\"language\">
255
+ <option value=\"en\" selected>English</option>
256
+ <option value=\"zh\">中文</option>
257
+ </select>
258
+ </label>
259
+ <input id=\"api_key\" placeholder=\"api key\" />
260
+ <input id=\"request_budget\" value=\"30\" />
261
+ <label><input id=\"use_mock\" type=\"checkbox\" /> use mock</label>
262
+ </div>
263
+ <div class=\"row\">
264
+ <button onclick=\"quickCheck()\">Quick API Check</button>
265
+ <button onclick=\"startRun()\">Run HydraDeck</button>
266
+ <button onclick=\"resumeLastRun()\">Resume Last Run</button>
267
+ </div>
268
+
269
+ <details>
270
+ <summary>Advanced model routing</summary>
271
+ <div class=\"tiny\">Per-agent model overrides (optional)</div>
272
+ <div class=\"row\"><select id=\"model_scope\"></select><select id=\"model_structure\"></select></div>
273
+ <div class=\"row\"><select id=\"model_planner\"></select><select id=\"model_section\"></select></div>
274
+ <div class=\"row\"><select id=\"model_paper\"></select><select id=\"model_slides\"></select></div>
275
+ </details>
276
+ </div>
277
+ <div id=\"status\">Idle</div>
278
+ <div class=\"bar\"><div id=\"fill\" class=\"fill\"></div></div>
279
+ <div id=\"pct\">0%</div>
280
+ <div id=\"tab-artifacts\" class=\"panel tab\" style=\"display:none\">
281
+ <div class=\"row\">
282
+ <a id=\"paperLink\" target=\"_blank\"></a>
283
+ <a id=\"slidesLink\" target=\"_blank\"></a>
284
+ </div>
285
+ <div class=\"grid\">
286
+ <div><h4>Scope</h4><pre id=\"scope\"></pre></div>
287
+ <div><h4>Sections</h4><pre id=\"sections\"></pre></div>
288
+ <div><h4>paper.tex</h4><pre id=\"paper\"></pre></div>
289
+ <div><h4>slides.tex</h4><pre id=\"slides\"></pre></div>
290
+ </div>
291
+ </div>
292
+
293
+ <div id=\"tab-console\" class=\"panel tab\" style=\"display:none\">
294
+ <div class=\"grid\">
295
+ <div><h4>Progress</h4><pre id=\"progress\"></pre></div>
296
+ <div><h4>Events</h4><pre id=\"events\"></pre></div>
297
+ </div>
298
+ </div>
299
+
300
+ <script>
301
+ let jobId = null;
302
+ let timer = null;
303
+ let inflight = false;
304
+ let refreshFailCount = 0;
305
+
306
+ function showTab(id){
307
+ for(const el of document.querySelectorAll('.tab')) el.style.display='none';
308
+ document.getElementById(id).style.display='block';
309
+ }
310
+
311
+ function addModelOptions(selectId, models){
312
+ const s=document.getElementById(selectId);
313
+ s.innerHTML='';
314
+ const blank=document.createElement('option');
315
+ blank.value='';
316
+ blank.textContent = selectId==='model' ? '(default model)' : '(inherit default)';
317
+ s.appendChild(blank);
318
+ for(const m of models){
319
+ const o=document.createElement('option');
320
+ o.value=m; o.textContent=m; s.appendChild(o);
321
+ }
322
+ }
323
+
324
+ async function loadModels(){
325
+ try{
326
+ const ctl = new AbortController();
327
+ const t = setTimeout(()=>ctl.abort(), 15000);
328
+ const r=await fetch('/api/models?base_url='+encodeURIComponent(document.getElementById('base_url').value)+'&api_key='+encodeURIComponent(document.getElementById('api_key').value), {signal: ctl.signal});
329
+ clearTimeout(t);
330
+ const j=await r.json();
331
+ const models=Array.isArray(j.models)?j.models:[];
332
+ for(const id of ['model','model_scope','model_structure','model_planner','model_section','model_paper','model_slides']) addModelOptions(id, models);
333
+ if(models.includes('grok-3-mini')) document.getElementById('model').value='grok-3-mini';
334
+ }catch(e){
335
+ document.getElementById('status').innerText='model list failed: '+e;
336
+ }
337
+ }
338
+
339
+ function payload(){
340
+ return {
341
+ topic: document.getElementById('topic').value,
342
+ model: document.getElementById('model').value,
343
+ base_url: document.getElementById('base_url').value,
344
+ api_key: document.getElementById('api_key').value,
345
+ request_budget: Number(document.getElementById('request_budget').value || 30),
346
+ use_mock: document.getElementById('use_mock').checked,
347
+ language: document.getElementById('language').value,
348
+ model_scope: document.getElementById('model_scope').value,
349
+ model_structure: document.getElementById('model_structure').value,
350
+ model_planner: document.getElementById('model_planner').value,
351
+ model_section: document.getElementById('model_section').value,
352
+ model_paper: document.getElementById('model_paper').value,
353
+ model_slides: document.getElementById('model_slides').value,
354
+ };
355
+ }
356
+
357
+ async function quickCheck(){
358
+ const ctl = new AbortController();
359
+ const t = setTimeout(()=>ctl.abort(), 20000);
360
+ const r = await fetch('/api/quick-check',{method:'POST',headers:{'content-type':'application/json'},body:JSON.stringify(payload()),signal: ctl.signal});
361
+ clearTimeout(t);
362
+ const j = await r.json();
363
+ document.getElementById('status').innerText = j.result || j.error;
364
+ showTab('tab-console');
365
+ }
366
+
367
+ async function startRun(){
368
+ if(inflight) return;
369
+ inflight = true;
370
+ const ctl = new AbortController();
371
+ const t = setTimeout(()=>ctl.abort(), 20000);
372
+ const r = await fetch('/api/jobs',{method:'POST',headers:{'content-type':'application/json'},body:JSON.stringify(payload()),signal: ctl.signal});
373
+ clearTimeout(t);
374
+ const j = await r.json();
375
+ jobId = j.id;
376
+ localStorage.setItem('hydradeck_last_job_id', jobId);
377
+ if (timer) clearInterval(timer);
378
+ timer = setInterval(refresh, 1000);
379
+ refresh();
380
+ showTab('tab-console');
381
+ }
382
+
383
+ async function refresh(){
384
+ if(!inflight) return;
385
+ if(!jobId) return;
386
+ try {
387
+ const ctl = new AbortController();
388
+ const t = setTimeout(()=>ctl.abort(), 12000);
389
+ const r = await fetch('/api/jobs/'+jobId, {signal: ctl.signal});
390
+ clearTimeout(t);
391
+ if(!r.ok) {
392
+ refreshFailCount += 1;
393
+ if (refreshFailCount >= 5) {
394
+ inflight = false;
395
+ if (timer) { clearInterval(timer); timer = null; }
396
+ document.getElementById('status').innerText = 'Polling paused (network/server issue). Use Resume Last Run.';
397
+ }
398
+ return;
399
+ }
400
+ const j = await r.json();
401
+ refreshFailCount = 0;
402
+ document.getElementById('status').innerText = j.status_text || j.status;
403
+ const p = Math.max(0, Math.min(100, Number(j.progress || 0)));
404
+ document.getElementById('fill').style.width = p + '%';
405
+ document.getElementById('pct').innerText = p + '%';
406
+ document.getElementById('progress').innerText = j.progress_log || '';
407
+ document.getElementById('scope').innerText = j.scope || '';
408
+ document.getElementById('sections').innerText = j.sections || '';
409
+ document.getElementById('paper').innerText = j.paper || '';
410
+ document.getElementById('slides').innerText = j.slides || '';
411
+ document.getElementById('events').innerText = JSON.stringify(j.events || [], null, 2);
412
+
413
+ const p1 = document.getElementById('paperLink');
414
+ const p2 = document.getElementById('slidesLink');
415
+ if (j.paper_pdf){ p1.href = '/api/jobs/'+jobId+'/artifact/paper'; p1.innerText='Download paper.pdf'; }
416
+ if (j.slides_pdf){ p2.href = '/api/jobs/'+jobId+'/artifact/slides'; p2.innerText='Download slides.pdf'; }
417
+
418
+ if (j.status === 'done' || j.status === 'error') {
419
+ clearInterval(timer);
420
+ timer = null;
421
+ inflight = false;
422
+ localStorage.removeItem('hydradeck_last_job_id');
423
+ }
424
+ } catch (e) {
425
+ refreshFailCount += 1;
426
+ if (refreshFailCount >= 5) {
427
+ inflight = false;
428
+ if (timer) { clearInterval(timer); timer = null; }
429
+ document.getElementById('status').innerText = 'Polling paused due to repeated timeout. Use Resume Last Run.';
430
+ }
431
+ }
432
+ }
433
+
434
+ function resumeLastRun(){
435
+ const saved = localStorage.getItem('hydradeck_last_job_id');
436
+ if(!saved){
437
+ document.getElementById('status').innerText = 'No resumable job.';
438
+ return;
439
+ }
440
+ jobId = saved;
441
+ inflight = true;
442
+ refreshFailCount = 0;
443
+ if (timer) clearInterval(timer);
444
+ timer = setInterval(refresh, 1000);
445
+ refresh();
446
+ showTab('tab-console');
447
+ }
448
+
449
+ document.getElementById('base_url').addEventListener('change', loadModels);
450
+ document.getElementById('api_key').addEventListener('change', loadModels);
451
+ loadModels();
452
+ showTab('tab-run');
453
+ if(localStorage.getItem('hydradeck_last_job_id')){
454
+ document.getElementById('status').innerText = 'Last run available. Click Resume Last Run to continue.';
455
+ }
456
+ </script>
457
+ </body>
458
+ </html>
459
+ """
460
+
461
+
462
+ @app.post("/api/quick-check")
463
+ def api_quick_check(req: RunRequest) -> dict[str, str]:
464
+ result = _api_quick_check(req.base_url, req.api_key, req.model, req.request_budget)
465
+ return {"result": result}
466
+
467
+
468
+ @app.post("/api/jobs")
469
+ def create_job(req: RunRequest) -> dict[str, str]:
470
+ if not req.topic.strip():
471
+ raise HTTPException(status_code=400, detail="topic is required")
472
+ job = _new_job(req)
473
+ with LOCK:
474
+ JOBS[job["id"]] = job
475
+ _prune_history()
476
+ _save_state()
477
+ t = threading.Thread(target=_run_job, args=(job["id"], req), daemon=True)
478
+ t.start()
479
+ return {"id": job["id"]}
480
+
481
+
482
+ @app.get("/api/history")
483
+ def get_history() -> dict[str, Any]:
484
+ with LOCK:
485
+ items = sorted(
486
+ JOBS.values(),
487
+ key=lambda j: float(j.get("updated_at", 0.0)),
488
+ reverse=True,
489
+ )
490
+ rows = [
491
+ {
492
+ "id": j.get("id"),
493
+ "status": j.get("status"),
494
+ "progress": j.get("progress"),
495
+ "topic": (j.get("params") or {}).get("topic", ""),
496
+ "updated_at": j.get("updated_at"),
497
+ }
498
+ for j in items[:HISTORY_LIMIT]
499
+ ]
500
+ return {"items": rows}
501
+
502
+
503
+ @app.get("/api/models")
504
+ def get_models(base_url: str, api_key: str = "") -> dict[str, Any]:
505
+ try:
506
+ cli = GrokClient(base_url=base_url, api_key=api_key, model="grok-3-mini", timeout_s=20.0, max_retries=1)
507
+ models = cli.list_models(timeout_s=20.0)
508
+ return {"models": models}
509
+ except Exception as exc:
510
+ return {"models": [], "error": str(exc)}
511
+
512
+
513
+ @app.get("/api/jobs/{job_id}")
514
+ def get_job(job_id: str) -> dict[str, Any]:
515
+ with LOCK:
516
+ job = JOBS.get(job_id)
517
+ if not job:
518
+ raise HTTPException(status_code=404, detail="job not found")
519
+ return dict(job)
520
+
521
+
522
+ @app.get("/api/jobs/{job_id}/artifact/{kind}")
523
+ def get_artifact(job_id: str, kind: str):
524
+ with LOCK:
525
+ job = JOBS.get(job_id)
526
+ if not job:
527
+ raise HTTPException(status_code=404, detail="job not found")
528
+ if kind == "paper":
529
+ path = str(job.get("paper_pdf", ""))
530
+ filename = "paper.pdf"
531
+ elif kind == "slides":
532
+ path = str(job.get("slides_pdf", ""))
533
+ filename = "slides.pdf"
534
+ else:
535
+ raise HTTPException(status_code=400, detail="kind must be paper|slides")
536
+
537
+ p = Path(path)
538
+ if not path or not p.exists():
539
+ raise HTTPException(status_code=404, detail="artifact not ready")
540
+ return FileResponse(str(p), media_type="application/pdf", filename=filename)
541
+
542
+
543
+ if __name__ == "__main__":
544
+ import uvicorn
545
+
546
+ _load_state()
547
+ uvicorn.run(app, host="0.0.0.0", port=7861)
hydradeck/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ __all__ = ["__version__"]
2
+
3
+ __version__ = "0.1.0"
hydradeck/agents/personas.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+
5
+
6
+ @dataclass(frozen=True)
7
+ class Persona:
8
+ name: str
9
+ system_prompt: str
10
+
11
+
12
+ PERSONAS: list[Persona] = [
13
+ Persona(
14
+ name="QueryPlanner",
15
+ system_prompt="\n".join(
16
+ [
17
+ "You are a query planner for deep research.",
18
+ "You produce diverse, high-recall search queries.",
19
+ "Prefer queries that locate primary sources and benchmarks.",
20
+ "Return concise query lists and what each query is for.",
21
+ ]
22
+ ),
23
+ ),
24
+ Persona(
25
+ name="Explorer",
26
+ system_prompt=(
27
+ "\n".join(
28
+ [
29
+ "You are an exploratory researcher.",
30
+ "Propose search directions, structure, and hypotheses.",
31
+ "Be concrete: propose queries and evaluation criteria.",
32
+ "State what evidence would change conclusions.",
33
+ ]
34
+ )
35
+ ),
36
+ ),
37
+ Persona(
38
+ name="Librarian",
39
+ system_prompt=(
40
+ "\n".join(
41
+ [
42
+ "You are a source curator.",
43
+ "Prefer primary sources: official docs, standards, peer-reviewed papers.",
44
+ "Avoid SEO spam.",
45
+ "For every claim, think about what citation would support it.",
46
+ ]
47
+ )
48
+ ),
49
+ ),
50
+ Persona(
51
+ name="Skeptic",
52
+ system_prompt=(
53
+ "\n".join(
54
+ [
55
+ "You are a skeptical reviewer.",
56
+ "Challenge unsupported claims and ask for stronger evidence.",
57
+ "Surface counterexamples, limitations, and propose sanity checks.",
58
+ ]
59
+ )
60
+ ),
61
+ ),
62
+ Persona(
63
+ name="Synthesizer",
64
+ system_prompt=(
65
+ "\n".join(
66
+ [
67
+ "You are a technical writer.",
68
+ "Produce detailed, structured, citation-grounded research reports.",
69
+ "Separate what is known vs uncertain.",
70
+ "Include actionable takeaways.",
71
+ ]
72
+ )
73
+ ),
74
+ ),
75
+ Persona(
76
+ name="Presenter",
77
+ system_prompt=(
78
+ "\n".join(
79
+ [
80
+ "You are a speaking coach and slide designer.",
81
+ "Create a clear talk, strong narrative, and Beamer slides.",
82
+ "Keep slides concise, but keep the script detailed.",
83
+ ]
84
+ )
85
+ ),
86
+ ),
87
+ Persona(
88
+ name="Judge",
89
+ system_prompt="\n".join(
90
+ [
91
+ "You are a strict third-party evaluator.",
92
+ "Score the provided artifacts against the rubric.",
93
+ "Be specific about missing sections, weak evidence, and citation issues.",
94
+ "Return JSON only.",
95
+ ]
96
+ ),
97
+ ),
98
+ ]
hydradeck/cli.py ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import sys
5
+ from pathlib import Path
6
+
7
+ from hydradeck.config import (
8
+ UserConfig,
9
+ resolve_api_key,
10
+ resolve_base_url,
11
+ resolve_model,
12
+ resolve_pdf_compiler,
13
+ resolve_template,
14
+ save_config,
15
+ )
16
+ from hydradeck.core.types import RunConfig
17
+ from hydradeck.pipeline import run
18
+ from hydradeck.resources_pack import build_resources_pack
19
+
20
+
21
+ def _build_parser() -> argparse.ArgumentParser:
22
+ p = argparse.ArgumentParser(prog="hydradeck")
23
+ sub = p.add_subparsers(dest="cmd", required=True)
24
+
25
+ runp = sub.add_parser("run", help="Run Grok deep research pipeline")
26
+ runp.add_argument("--topic", required=True, help="Research topic")
27
+ runp.add_argument("--out", required=True, help="Output directory or .zip")
28
+ runp.add_argument("--iterations", type=int, default=3, help="Persona iteration rounds")
29
+ runp.add_argument("--max-sources", type=int, default=10, help="Max sources to include")
30
+ runp.add_argument(
31
+ "--min-words",
32
+ type=int,
33
+ default=12000,
34
+ help="Target minimum words (guidance to model; markdown is primary)",
35
+ )
36
+ runp.add_argument("--base-url", default=None, help="API base URL")
37
+ runp.add_argument("--model", default=None, help="Model name")
38
+ runp.add_argument(
39
+ "--keep-stage",
40
+ action="store_true",
41
+ help="If --out is a .zip, keep the staging directory on disk",
42
+ )
43
+ runp.add_argument(
44
+ "--seed-url",
45
+ action="append",
46
+ default=None,
47
+ help="Seed URL to include as source (can be repeated)",
48
+ )
49
+ runp.add_argument("--llm-timeout", type=float, default=180.0, help="LLM timeout seconds")
50
+ runp.add_argument("--mock", action="store_true", help="Use deterministic mock (no network)")
51
+ runp.add_argument("--verbose", action="store_true", help="Verbose logging")
52
+ runp.add_argument(
53
+ "--heartbeat",
54
+ action="store_true",
55
+ help="Emit periodic heartbeat during long network calls",
56
+ )
57
+ runp.add_argument(
58
+ "--progress",
59
+ action="store_true",
60
+ help="Show a progress bar for generation stages",
61
+ )
62
+ runp.add_argument(
63
+ "--request-budget",
64
+ type=float,
65
+ default=20.0,
66
+ help="Per-request timeout budget (seconds)",
67
+ )
68
+ runp.add_argument(
69
+ "--verbatim",
70
+ action="store_true",
71
+ help="Write model-produced artifacts verbatim (no rendering/rewriting)",
72
+ )
73
+ runp.add_argument(
74
+ "--no-archive-prompts",
75
+ action="store_true",
76
+ help="Do not archive prompts/requests in the output package",
77
+ )
78
+ runp.add_argument(
79
+ "--quality-gate",
80
+ action="store_true",
81
+ help="Require passing third-party score before writing outputs",
82
+ )
83
+ runp.add_argument(
84
+ "--min-quality",
85
+ type=float,
86
+ default=0.85,
87
+ help="Minimum quality score (0-1)",
88
+ )
89
+ runp.add_argument(
90
+ "--quality-attempts",
91
+ type=int,
92
+ default=3,
93
+ help="Max regeneration attempts to meet quality gate",
94
+ )
95
+ runp.add_argument(
96
+ "--archive-snapshots",
97
+ action="store_true",
98
+ help="Fetch and archive source page snapshots into resources/snapshots",
99
+ )
100
+ runp.add_argument(
101
+ "--snapshot-timeout",
102
+ type=float,
103
+ default=25.0,
104
+ help="Per-URL snapshot fetch timeout (seconds)",
105
+ )
106
+ runp.add_argument(
107
+ "--snapshot-total-timeout",
108
+ type=float,
109
+ default=60.0,
110
+ help="Total time budget for all snapshots (seconds)",
111
+ )
112
+
113
+ prep = sub.add_parser(
114
+ "pre",
115
+ help="Generate a preset pre-research package (no API key required)",
116
+ )
117
+ prep.add_argument("--preset", required=True, help="Preset name (e.g. rynnbrain)")
118
+ prep.add_argument("--out", required=True, help="Output directory or .zip")
119
+ prep.add_argument(
120
+ "--keep-stage",
121
+ action="store_true",
122
+ help="Keep staging directory when output is .zip",
123
+ )
124
+ prep.add_argument(
125
+ "--no-fetch",
126
+ action="store_true",
127
+ help="Do not fetch and archive web snapshots",
128
+ )
129
+
130
+ models_p = sub.add_parser("models", help="List available models")
131
+ models_p.add_argument(
132
+ "--base-url",
133
+ default=None,
134
+ help="API base URL",
135
+ )
136
+
137
+ auto_p = sub.add_parser(
138
+ "auto",
139
+ help="Run autonomous deep research (verbatim + prompts + snapshots)",
140
+ )
141
+ auto_p.add_argument("--topic", required=True, help="Research topic")
142
+ auto_p.add_argument("--out", required=True, help="Output directory or .zip")
143
+ auto_p.add_argument(
144
+ "--base-url",
145
+ default=None,
146
+ help="API base URL",
147
+ )
148
+ auto_p.add_argument(
149
+ "--model",
150
+ default=None,
151
+ help="Fallback model name",
152
+ )
153
+ auto_p.add_argument(
154
+ "--iterations",
155
+ type=int,
156
+ default=3,
157
+ help="Persona iteration rounds",
158
+ )
159
+ auto_p.add_argument(
160
+ "--max-sources",
161
+ type=int,
162
+ default=12,
163
+ help="Max sources to include",
164
+ )
165
+ auto_p.add_argument(
166
+ "--module-sources",
167
+ type=int,
168
+ default=5,
169
+ help="Sources per query module",
170
+ )
171
+ auto_p.add_argument(
172
+ "--query-count",
173
+ type=int,
174
+ default=8,
175
+ help="Number of queries to generate (high recall)",
176
+ )
177
+ auto_p.add_argument(
178
+ "--max-query-modules",
179
+ type=int,
180
+ default=2,
181
+ help="Max query modules to expand into sources",
182
+ )
183
+ auto_p.add_argument(
184
+ "--sources-attempts",
185
+ type=int,
186
+ default=3,
187
+ help="Max attempts to obtain sources (must be <=3)",
188
+ )
189
+ auto_p.add_argument(
190
+ "--facts-max-pages",
191
+ type=int,
192
+ default=6,
193
+ help="Max pages to pass into facts extraction",
194
+ )
195
+ auto_p.add_argument(
196
+ "--facts-max-chars",
197
+ type=int,
198
+ default=8000,
199
+ help="Max chars per page passed into facts extraction",
200
+ )
201
+ auto_p.add_argument(
202
+ "--facts-target",
203
+ type=int,
204
+ default=30,
205
+ help="Approximate number of facts to extract",
206
+ )
207
+ auto_p.add_argument(
208
+ "--judge-max-chars",
209
+ type=int,
210
+ default=12000,
211
+ help="Max chars per artifact passed into judge",
212
+ )
213
+ auto_p.add_argument(
214
+ "--max-runtime",
215
+ type=float,
216
+ default=240.0,
217
+ help="Max total runtime seconds before aborting",
218
+ )
219
+ auto_p.add_argument(
220
+ "--llm-timeout",
221
+ type=float,
222
+ default=180.0,
223
+ help="LLM timeout seconds",
224
+ )
225
+ auto_p.add_argument(
226
+ "--snapshot-timeout",
227
+ type=float,
228
+ default=25.0,
229
+ help="Per-URL snapshot fetch timeout (seconds)",
230
+ )
231
+ auto_p.add_argument("--mock", action="store_true", help="Use deterministic mock")
232
+ auto_p.add_argument("--verbose", action="store_true", help="Verbose logging")
233
+ auto_p.add_argument(
234
+ "--heartbeat",
235
+ action="store_true",
236
+ help="Emit periodic heartbeat during long network calls",
237
+ )
238
+ auto_p.add_argument(
239
+ "--progress",
240
+ action="store_true",
241
+ help="Show a progress bar for generation stages",
242
+ )
243
+ auto_p.add_argument(
244
+ "--request-budget",
245
+ type=float,
246
+ default=20.0,
247
+ help="Per-request timeout budget (seconds)",
248
+ )
249
+ auto_p.add_argument(
250
+ "--min-quality",
251
+ type=float,
252
+ default=0.85,
253
+ help="Minimum quality score (0-1)",
254
+ )
255
+ auto_p.add_argument(
256
+ "--quality-attempts",
257
+ type=int,
258
+ default=3,
259
+ help="Max regeneration attempts to meet quality gate",
260
+ )
261
+
262
+ cfg_p = sub.add_parser("config", help="Persist local config (base_url/model/api_key)")
263
+ cfg_p.add_argument("--base-url", default=None, help="API base URL")
264
+ cfg_p.add_argument("--model", default=None, help="Default model")
265
+ cfg_p.add_argument("--api-key", default=None, help="API key (stored locally)")
266
+ cfg_p.add_argument(
267
+ "--pdf-compiler",
268
+ default=None,
269
+ help="PDF compiler backend: latexonline or texlive",
270
+ )
271
+ cfg_p.add_argument(
272
+ "--template",
273
+ default=None,
274
+ help="Template: iclr2026 or plain",
275
+ )
276
+
277
+ res_p = sub.add_parser("resources", help="One-click resources pack (no seed required)")
278
+ res_p.add_argument("--topic", required=True, help="Research topic")
279
+ res_p.add_argument("--out", required=True, help="Output directory or .zip")
280
+ res_p.add_argument(
281
+ "--base-url",
282
+ default=None,
283
+ help="API base URL",
284
+ )
285
+ res_p.add_argument(
286
+ "--model",
287
+ default=None,
288
+ help="Model name",
289
+ )
290
+ res_p.add_argument(
291
+ "--pdf-compiler",
292
+ default=resolve_pdf_compiler("auto"),
293
+ help="PDF compiler: auto|latexonline|texlive",
294
+ )
295
+ res_p.add_argument(
296
+ "--template",
297
+ default=resolve_template("pretty"),
298
+ help="Template: pretty|plain",
299
+ )
300
+ res_p.add_argument("--max-sources", type=int, default=8, help="Max sources")
301
+ res_p.add_argument("--module-sources", type=int, default=3, help="Sources per module")
302
+ res_p.add_argument("--llm-timeout", type=float, default=35.0, help="LLM timeout")
303
+ res_p.add_argument("--snapshot-timeout", type=float, default=10.0, help="Snapshot timeout")
304
+ res_p.add_argument(
305
+ "--snapshot-total-timeout",
306
+ type=float,
307
+ default=60.0,
308
+ help="Total time budget for all snapshots",
309
+ )
310
+ res_p.add_argument("--max-runtime", type=float, default=180.0, help="Max runtime")
311
+ res_p.add_argument("--request-budget", type=float, default=15.0, help="Per-request budget")
312
+ res_p.add_argument("--keep-stage", action="store_true", help="Keep staging directory")
313
+ res_p.add_argument("--heartbeat", action="store_true", help="Heartbeat")
314
+ res_p.add_argument("--progress", action="store_true", help="Progress bar")
315
+
316
+ wiz_p = sub.add_parser("wizard", help="Guided research (interactive)")
317
+ wiz_p.add_argument("--out", required=False, default=None, help="Output directory or .zip")
318
+ return p
319
+
320
+
321
+ def _prompt(prompt: str, default: str | None = None) -> str:
322
+ suffix = f" [{default}]" if default else ""
323
+ v = input(prompt + suffix + ": ").strip()
324
+ if not v and default is not None:
325
+ return default
326
+ return v
327
+
328
+
329
+ def _prompt_int(prompt: str, default: int) -> int:
330
+ v = _prompt(prompt, str(default))
331
+ try:
332
+ return int(v)
333
+ except Exception:
334
+ return default
335
+
336
+
337
+ def _prompt_float(prompt: str, default: float) -> float:
338
+ v = _prompt(prompt, str(default))
339
+ try:
340
+ return float(v)
341
+ except Exception:
342
+ return default
343
+
344
+
345
+ def main(argv: list[str] | None = None) -> int:
346
+ args = _build_parser().parse_args(argv)
347
+ if args.cmd == "run":
348
+ base_url = resolve_base_url(args.base_url)
349
+ model = resolve_model(args.model)
350
+ cfg = RunConfig(
351
+ topic=args.topic,
352
+ out=Path(args.out),
353
+ base_url=base_url,
354
+ api_key=resolve_api_key(),
355
+ model=model,
356
+ iterations=max(int(args.iterations), 1),
357
+ max_sources=max(int(args.max_sources), 1),
358
+ min_total_words=max(int(args.min_words), 1000),
359
+ use_mock=bool(args.mock),
360
+ verbose=bool(args.verbose or args.heartbeat),
361
+ progress=bool(args.progress),
362
+ llm_timeout_s=float(args.llm_timeout),
363
+ request_budget_s=float(args.request_budget),
364
+ keep_stage=bool(args.keep_stage),
365
+ verbatim=bool(args.verbatim),
366
+ archive_prompts=not bool(args.no_archive_prompts),
367
+ archive_snapshots=bool(args.archive_snapshots),
368
+ snapshot_timeout_s=float(args.snapshot_timeout),
369
+ snapshot_total_timeout_s=float(args.snapshot_total_timeout),
370
+ quality_gate=bool(args.quality_gate),
371
+ min_quality_score=float(args.min_quality),
372
+ max_quality_attempts=int(args.quality_attempts),
373
+ seed_urls=args.seed_url,
374
+ )
375
+ run(cfg)
376
+ return 0
377
+ if args.cmd == "pre":
378
+ from hydradeck.presets.rynnbrain import generate
379
+
380
+ if str(args.preset).strip().lower() != "rynnbrain":
381
+ print(f"Unknown preset: {args.preset}", file=sys.stderr)
382
+ return 2
383
+ generate(
384
+ out=Path(args.out),
385
+ keep_stage=bool(args.keep_stage),
386
+ fetch=not bool(args.no_fetch),
387
+ )
388
+ return 0
389
+
390
+ if args.cmd == "models":
391
+ from hydradeck.clients import GrokClient
392
+
393
+ client = GrokClient(
394
+ base_url=resolve_base_url(str(args.base_url) if args.base_url else None),
395
+ api_key=resolve_api_key(),
396
+ model="grok-4",
397
+ )
398
+ for mid in client.list_models():
399
+ print(mid)
400
+ return 0
401
+
402
+ if args.cmd == "auto":
403
+ base_url = resolve_base_url(args.base_url)
404
+ model = resolve_model(args.model)
405
+ cfg = RunConfig(
406
+ topic=args.topic,
407
+ out=Path(args.out),
408
+ base_url=base_url,
409
+ api_key=resolve_api_key(),
410
+ model=model,
411
+ iterations=max(int(args.iterations), 1),
412
+ max_sources=max(int(args.max_sources), 1),
413
+ module_sources=max(int(args.module_sources), 1),
414
+ query_count=max(int(args.query_count), 1),
415
+ max_query_modules=max(int(args.max_query_modules), 1),
416
+ sources_attempts=min(max(int(args.sources_attempts), 1), 3),
417
+ facts_max_pages=max(int(args.facts_max_pages), 1),
418
+ facts_max_chars_per_page=max(int(args.facts_max_chars), 1000),
419
+ facts_target=max(int(args.facts_target), 5),
420
+ judge_max_chars=max(int(args.judge_max_chars), 2000),
421
+ max_total_runtime_s=float(args.max_runtime),
422
+ min_total_words=12000,
423
+ use_mock=bool(args.mock),
424
+ verbose=bool(args.verbose or args.heartbeat),
425
+ progress=bool(args.progress),
426
+ llm_timeout_s=float(args.llm_timeout),
427
+ keep_stage=False,
428
+ verbatim=True,
429
+ archive_prompts=True,
430
+ archive_snapshots=True,
431
+ snapshot_timeout_s=float(args.snapshot_timeout),
432
+ auto=True,
433
+ auto_queries=True,
434
+ auto_models=True,
435
+ quality_gate=True,
436
+ min_quality_score=float(args.min_quality),
437
+ max_quality_attempts=int(args.quality_attempts),
438
+ seed_urls=None,
439
+ )
440
+ run(cfg)
441
+ return 0
442
+
443
+ if args.cmd == "config":
444
+ uc = UserConfig(
445
+ base_url=str(args.base_url) if args.base_url else None,
446
+ api_key=str(args.api_key) if args.api_key else None,
447
+ model=str(args.model) if args.model else None,
448
+ pdf_compiler=str(args.pdf_compiler) if args.pdf_compiler else None,
449
+ template=str(args.template) if args.template else None,
450
+ )
451
+ p = save_config(uc)
452
+ print(str(p))
453
+ return 0
454
+
455
+ if args.cmd == "resources":
456
+ base_url = resolve_base_url(args.base_url)
457
+ model = resolve_model(args.model)
458
+ cfg = RunConfig(
459
+ topic=args.topic,
460
+ out=Path(args.out),
461
+ base_url=base_url,
462
+ api_key=resolve_api_key(),
463
+ model=model,
464
+ pdf_compiler=str(args.pdf_compiler),
465
+ template=str(args.template),
466
+ max_sources=max(int(args.max_sources), 1),
467
+ module_sources=max(int(args.module_sources), 1),
468
+ use_mock=False,
469
+ verbose=bool(args.heartbeat),
470
+ progress=bool(args.progress),
471
+ llm_timeout_s=float(args.llm_timeout),
472
+ snapshot_timeout_s=float(args.snapshot_timeout),
473
+ max_total_runtime_s=float(args.max_runtime),
474
+ request_budget_s=float(args.request_budget),
475
+ keep_stage=bool(args.keep_stage),
476
+ )
477
+ build_resources_pack(cfg)
478
+ return 0
479
+
480
+ if args.cmd == "wizard":
481
+ topic = _prompt("Topic", "RynnBrain")
482
+ out = args.out or _prompt("Output path (.zip)", "hydradeck/out/pre.zip")
483
+ base_url = _prompt("Base URL (from config if empty)", "")
484
+ model = _prompt("Model (from config if empty)", "")
485
+ max_sources = _prompt_int("Max sources", 8)
486
+ module_sources = _prompt_int("Sources per module", 3)
487
+ llm_timeout = _prompt_float("LLM timeout (s)", 35.0)
488
+ snapshot_timeout = _prompt_float("Snapshot timeout (s)", 10.0)
489
+ max_runtime = _prompt_float("Max runtime (s)", 300.0)
490
+ request_budget = _prompt_float("Per-request budget (s)", 20.0)
491
+ pdf_compiler = _prompt("PDF compiler (auto|latexonline|texlive)", "auto")
492
+ template = _prompt("Template (iclr2026|plain)", "iclr2026")
493
+
494
+ cfg = RunConfig(
495
+ topic=topic,
496
+ out=Path(out),
497
+ base_url=resolve_base_url(base_url or None),
498
+ api_key=resolve_api_key(),
499
+ model=resolve_model(model or None),
500
+ pdf_compiler=pdf_compiler,
501
+ template=template,
502
+ max_sources=max(max_sources, 1),
503
+ module_sources=max(module_sources, 1),
504
+ use_mock=False,
505
+ verbose=True,
506
+ progress=True,
507
+ llm_timeout_s=llm_timeout,
508
+ snapshot_timeout_s=snapshot_timeout,
509
+ max_total_runtime_s=max_runtime,
510
+ request_budget_s=request_budget,
511
+ keep_stage=False,
512
+ )
513
+ build_resources_pack(cfg)
514
+ print(out)
515
+ return 0
516
+
517
+ print(f"Unknown command: {args.cmd}", file=sys.stderr)
518
+ return 2
519
+
520
+
521
+ if __name__ == "__main__":
522
+ raise SystemExit(main())
hydradeck/clients/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ __all__ = ["GrokClient", "MockClient", "ChatMessage", "GrokClientError"]
2
+
3
+ from hydradeck.clients.grok_client import ChatMessage, GrokClient, GrokClientError, MockClient
hydradeck/clients/grok_client.py ADDED
@@ -0,0 +1,373 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import time
5
+ from dataclasses import dataclass
6
+
7
+ import requests
8
+
9
+ from hydradeck.utils import Heartbeat
10
+
11
+ JSON = dict[str, object]
12
+
13
+ CHROME_144_UA = (
14
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
15
+ "AppleWebKit/537.36 (KHTML, like Gecko) "
16
+ "Chrome/144.0.0.0 Safari/537.36"
17
+ )
18
+
19
+
20
+ class GrokClientError(RuntimeError):
21
+ pass
22
+
23
+
24
+ @dataclass(frozen=True)
25
+ class ChatMessage:
26
+ role: str
27
+ content: str
28
+
29
+
30
+ class GrokClient:
31
+ def __init__(
32
+ self,
33
+ base_url: str,
34
+ api_key: str,
35
+ model: str,
36
+ timeout_s: float = 180.0,
37
+ max_retries: int = 3,
38
+ heartbeat: bool = False,
39
+ heartbeat_interval_s: float = 5.0,
40
+ ) -> None:
41
+ self._base_url = base_url.rstrip("/")
42
+ self._api_key = api_key
43
+ self._model = model
44
+ self._timeout_s = timeout_s
45
+ self._max_retries = max_retries
46
+ self._heartbeat = heartbeat
47
+ self._heartbeat_interval_s = heartbeat_interval_s
48
+
49
+ def chat_text(
50
+ self,
51
+ messages: list[ChatMessage],
52
+ temperature: float = 0.3,
53
+ timeout_s: float | None = None,
54
+ ) -> str:
55
+ msgs = [{"role": m.role, "content": m.content} for m in messages]
56
+ data = self._post_chat(
57
+ {"model": self._model, "messages": msgs, "temperature": temperature},
58
+ timeout_s=timeout_s,
59
+ )
60
+ choices = data.get("choices")
61
+ if not isinstance(choices, list) or not choices:
62
+ raise GrokClientError(f"No choices in response: {data}")
63
+ msg = choices[0].get("message") if isinstance(choices[0], dict) else None
64
+ content = msg.get("content") if isinstance(msg, dict) else None
65
+ if not isinstance(content, str):
66
+ raise GrokClientError(f"No message.content in response: {data}")
67
+ return content.strip()
68
+
69
+ def chat_json(
70
+ self,
71
+ messages: list[ChatMessage],
72
+ schema_hint: str,
73
+ temperature: float = 0.2,
74
+ timeout_s: float | None = None,
75
+ ) -> JSON:
76
+ suffix = (
77
+ "\n\nReturn ONLY valid JSON. Do not include markdown fences. "
78
+ "If unsure, still return best-effort JSON that matches: "
79
+ + schema_hint
80
+ )
81
+ msgs = [{"role": m.role, "content": m.content} for m in messages]
82
+ if msgs and msgs[-1].get("role") == "user":
83
+ msgs[-1]["content"] = str(msgs[-1]["content"]) + suffix
84
+ else:
85
+ msgs.append({"role": "user", "content": suffix})
86
+
87
+ text = self.chat_text(
88
+ [ChatMessage(role=m["role"], content=m["content"]) for m in msgs],
89
+ temperature=temperature,
90
+ timeout_s=timeout_s,
91
+ )
92
+ parsed = _best_effort_json_parse(text)
93
+ if parsed is None:
94
+ raise GrokClientError("Model did not return valid JSON. Response was:\n" + text)
95
+ return parsed
96
+
97
+ def _post_chat(self, payload: JSON, timeout_s: float | None = None) -> JSON:
98
+ url = f"{self._base_url}/v1/chat/completions"
99
+ headers = {"Content-Type": "application/json", "User-Agent": CHROME_144_UA}
100
+ if self._api_key:
101
+ headers["Authorization"] = f"Bearer {self._api_key}"
102
+
103
+ effective_timeout = float(timeout_s) if timeout_s is not None else self._timeout_s
104
+
105
+ last_err: Exception | None = None
106
+ for attempt in range(self._max_retries + 1):
107
+ try:
108
+ with Heartbeat(
109
+ enabled=self._heartbeat,
110
+ label=f"POST {url}",
111
+ interval_s=self._heartbeat_interval_s,
112
+ ):
113
+ r = requests.post(
114
+ url,
115
+ headers=headers,
116
+ json=payload,
117
+ timeout=effective_timeout,
118
+ )
119
+ if r.status_code >= 400:
120
+ raise GrokClientError(f"HTTP {r.status_code} from {url}: {r.text[:2000]}")
121
+ data = r.json()
122
+ if not isinstance(data, dict):
123
+ raise GrokClientError("Non-object response")
124
+ return data
125
+ except (requests.RequestException, ValueError, GrokClientError) as e:
126
+ last_err = e
127
+ if attempt >= self._max_retries:
128
+ break
129
+ time.sleep(0.5 * (2**attempt))
130
+ raise GrokClientError(f"Request failed after retries: {last_err}")
131
+
132
+ def list_models(self, timeout_s: float | None = None) -> list[str]:
133
+ url = f"{self._base_url}/v1/models"
134
+ headers: dict[str, str] = {"User-Agent": CHROME_144_UA}
135
+ if self._api_key:
136
+ headers["Authorization"] = f"Bearer {self._api_key}"
137
+ effective_timeout = float(timeout_s) if timeout_s is not None else self._timeout_s
138
+ with Heartbeat(
139
+ enabled=self._heartbeat,
140
+ label=f"GET {url}",
141
+ interval_s=self._heartbeat_interval_s,
142
+ ):
143
+ r = requests.get(url, headers=headers, timeout=effective_timeout)
144
+ if r.status_code >= 400:
145
+ raise GrokClientError(f"HTTP {r.status_code} from {url}: {r.text[:2000]}")
146
+ data = r.json()
147
+ if not isinstance(data, dict):
148
+ raise GrokClientError("Non-object response")
149
+ raw = data.get("data")
150
+ if not isinstance(raw, list):
151
+ return []
152
+ out: list[str] = []
153
+ for item in raw:
154
+ if isinstance(item, dict):
155
+ mid = item.get("id")
156
+ if isinstance(mid, str):
157
+ out.append(mid)
158
+ return out
159
+
160
+
161
+ class MockClient:
162
+ def chat_text(
163
+ self,
164
+ messages: list[ChatMessage],
165
+ temperature: float = 0.0,
166
+ timeout_s: float | None = None,
167
+ ) -> str:
168
+ _ = temperature
169
+ _ = timeout_s
170
+ joined = "\n".join([f"{m.role}: {m.content}" for m in messages])
171
+ low = joined.lower()
172
+ if "write a detailed pre-research report" in low:
173
+ return "\n".join(
174
+ [
175
+ "# Pre-Research Report",
176
+ "",
177
+ "## Research questions",
178
+ "- (Mock) What is the core problem?",
179
+ "",
180
+ "## Scope & non-scope",
181
+ "- Scope: offline mock run",
182
+ "- Non-scope: real web browsing",
183
+ "",
184
+ "## Search plan & queries",
185
+ "- query 1",
186
+ "- query 2",
187
+ "",
188
+ "## Risks & limitations",
189
+ "- Mock output is not evidence-backed",
190
+ "",
191
+ ]
192
+ )
193
+ if "write a long-form research report" in low:
194
+ return (
195
+ "# Research Report\n\n"
196
+ "## Summary\n(Mock)\n\n"
197
+ "## Resources\n1. Example Source 1 — https://example.com\n"
198
+ )
199
+ if "speech script" in low:
200
+ return (
201
+ "# Speech Script\n\n"
202
+ "## Opening\n(Mock)\n\n"
203
+ "## Main\n(Mock)\n\n"
204
+ "## Closing\n(Mock)\n"
205
+ )
206
+ if "critique the current research plan" in low:
207
+ return "- (Mock) Missing primary sources\n- (Mock) Claims need evidence\n"
208
+ if "sources" in joined.lower():
209
+ return json.dumps(
210
+ {
211
+ "sources": [
212
+ {
213
+ "url": "https://example.com",
214
+ "title": "Example Source 1",
215
+ "snippet": "Mock source for offline run.",
216
+ }
217
+ ]
218
+ },
219
+ ensure_ascii=False,
220
+ )
221
+ if "facts" in joined.lower():
222
+ return json.dumps(
223
+ {
224
+ "facts": [
225
+ {
226
+ "claim": "Mock mode produces deterministic artifacts.",
227
+ "evidence": "MockClient returns fixed outputs.",
228
+ "url": "https://example.com",
229
+ "title": "Example Source 1",
230
+ }
231
+ ]
232
+ },
233
+ ensure_ascii=False,
234
+ )
235
+ if "outline" in joined.lower():
236
+ return json.dumps(
237
+ {
238
+ "outline": [
239
+ "Background",
240
+ "Problem formulation",
241
+ "Methods",
242
+ "Findings",
243
+ "Limitations",
244
+ "Open questions",
245
+ ]
246
+ },
247
+ ensure_ascii=False,
248
+ )
249
+ return "Mock synthesis text."
250
+
251
+ def chat_json(
252
+ self,
253
+ messages: list[ChatMessage],
254
+ schema_hint: str,
255
+ temperature: float = 0.0,
256
+ timeout_s: float | None = None,
257
+ ) -> JSON:
258
+ _ = schema_hint
259
+ _ = timeout_s
260
+ joined = "\n".join([f"{m.role}: {m.content}" for m in messages])
261
+ low = joined.lower()
262
+ if "score" in low and "rubric" in low and "return json" in low:
263
+ return {
264
+ "score": 0.99,
265
+ "reasons": ["mock pass"],
266
+ "must_fix": [],
267
+ }
268
+ if "pre_report_md" in low and "paper_tex" in low and "slides_tex" in low:
269
+ return {
270
+ "pre_report_md": "\n".join(
271
+ [
272
+ "# Pre-Research (Mock)",
273
+ "",
274
+ "## 15-minute agenda",
275
+ "- 0:00-2:00 Background",
276
+ "- 2:00-6:00 Research questions",
277
+ "- 6:00-10:00 Evidence plan",
278
+ "- 10:00-13:00 Risks",
279
+ "- 13:00-15:00 Deliverables",
280
+ "",
281
+ "## Research questions",
282
+ "- RQ1 ...",
283
+ "- RQ2 ...",
284
+ "",
285
+ "## Search plan & queries",
286
+ "- query 1",
287
+ "- query 2",
288
+ "",
289
+ "## Resources",
290
+ "1. Example Source 1 — https://example.com",
291
+ "",
292
+ ]
293
+ ),
294
+ "report_md": "\n".join(
295
+ [
296
+ "# Research Report (Mock)",
297
+ "",
298
+ "## Summary",
299
+ "(Mock)",
300
+ "",
301
+ "## Findings",
302
+ "- (Mock) claim with [1]",
303
+ "",
304
+ "## Resources",
305
+ "[1] Example Source 1 — https://example.com",
306
+ "",
307
+ ]
308
+ ),
309
+ "speech_md": "\n".join(
310
+ [
311
+ "# Speech (Mock)",
312
+ "",
313
+ "[0:00] Opening hook",
314
+ "[2:00] Transition",
315
+ "[8:00] Key point",
316
+ "[14:00] Close + Q&A",
317
+ "",
318
+ ]
319
+ ),
320
+ "paper_tex": "\\documentclass{article}\\n\\begin{document}Mock\\end{document}\\n",
321
+ "slides_tex": "\\documentclass{beamer}\\n\\begin{document}Mock\\end{document}\\n",
322
+ "bibtex": "@misc{src1,title={Example},howpublished={\\url{https://example.com}}}\n",
323
+ }
324
+
325
+ text = self.chat_text(messages, temperature=temperature)
326
+ parsed = _best_effort_json_parse(text)
327
+ return parsed or {"ok": True}
328
+
329
+
330
+ def _best_effort_json_parse(text: str) -> JSON | None:
331
+ t = text.strip()
332
+ if not t:
333
+ return None
334
+ if t.startswith("{") and t.endswith("}"):
335
+ try:
336
+ v = json.loads(t)
337
+ if isinstance(v, dict):
338
+ return v
339
+ except Exception:
340
+ pass
341
+
342
+ start = t.find("{")
343
+ if start == -1:
344
+ return None
345
+ depth = 0
346
+ in_str = False
347
+ esc = False
348
+ for i in range(start, len(t)):
349
+ ch = t[i]
350
+ if in_str:
351
+ if esc:
352
+ esc = False
353
+ elif ch == "\\":
354
+ esc = True
355
+ elif ch == '"':
356
+ in_str = False
357
+ continue
358
+ if ch == '"':
359
+ in_str = True
360
+ continue
361
+ if ch == "{":
362
+ depth += 1
363
+ elif ch == "}":
364
+ depth -= 1
365
+ if depth == 0:
366
+ chunk = t[start : i + 1]
367
+ try:
368
+ v2 = json.loads(chunk)
369
+ if isinstance(v2, dict):
370
+ return v2
371
+ except Exception:
372
+ return None
373
+ return None
hydradeck/config.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import os
5
+ from dataclasses import dataclass
6
+ from pathlib import Path
7
+
8
+
9
+ @dataclass(frozen=True)
10
+ class UserConfig:
11
+ base_url: str | None = None
12
+ api_key: str | None = None
13
+ model: str | None = None
14
+ pdf_compiler: str | None = None
15
+ template: str | None = None
16
+
17
+
18
+ def config_path() -> Path:
19
+ xdg = os.environ.get("XDG_CONFIG_HOME")
20
+ if xdg:
21
+ return Path(xdg) / "hydradeck" / "config.json"
22
+ return Path.home() / ".config" / "hydradeck" / "config.json"
23
+
24
+
25
+ def load_config(path: Path | None = None) -> UserConfig:
26
+ p = path or config_path()
27
+ try:
28
+ data = json.loads(p.read_text(encoding="utf-8"))
29
+ except Exception:
30
+ return UserConfig()
31
+ if not isinstance(data, dict):
32
+ return UserConfig()
33
+ base_url = data.get("base_url")
34
+ api_key = data.get("api_key")
35
+ model = data.get("model")
36
+ pdf_compiler = data.get("pdf_compiler")
37
+ template = data.get("template")
38
+ return UserConfig(
39
+ base_url=base_url if isinstance(base_url, str) else None,
40
+ api_key=api_key if isinstance(api_key, str) else None,
41
+ model=model if isinstance(model, str) else None,
42
+ pdf_compiler=pdf_compiler if isinstance(pdf_compiler, str) else None,
43
+ template=template if isinstance(template, str) else None,
44
+ )
45
+
46
+
47
+ def find_project_config(start: Path | None = None) -> Path | None:
48
+ cur = (start or Path.cwd()).resolve()
49
+ for _ in range(8):
50
+ cand = cur / ".hydradeck" / "config.json"
51
+ if cand.exists():
52
+ return cand
53
+ if cur.parent == cur:
54
+ break
55
+ cur = cur.parent
56
+ return None
57
+
58
+
59
+ def load_merged_config() -> UserConfig:
60
+ user = load_config()
61
+ pc = find_project_config()
62
+ if pc is None:
63
+ return user
64
+ proj = load_config(path=pc)
65
+ return UserConfig(
66
+ base_url=proj.base_url or user.base_url,
67
+ api_key=proj.api_key or user.api_key,
68
+ model=proj.model or user.model,
69
+ pdf_compiler=proj.pdf_compiler or user.pdf_compiler,
70
+ template=proj.template or user.template,
71
+ )
72
+
73
+
74
+ def save_config(cfg: UserConfig, path: Path | None = None) -> Path:
75
+ p = path or config_path()
76
+ p.parent.mkdir(parents=True, exist_ok=True)
77
+ payload: dict[str, object] = {}
78
+ if cfg.base_url:
79
+ payload["base_url"] = cfg.base_url
80
+ if cfg.api_key:
81
+ payload["api_key"] = cfg.api_key
82
+ if cfg.model:
83
+ payload["model"] = cfg.model
84
+ if cfg.pdf_compiler:
85
+ payload["pdf_compiler"] = cfg.pdf_compiler
86
+ if cfg.template:
87
+ payload["template"] = cfg.template
88
+ p.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
89
+ return p
90
+
91
+
92
+ def resolve_api_key() -> str:
93
+ env = os.environ.get("GROK_API_KEY")
94
+ if env:
95
+ return env
96
+ cfg = load_merged_config()
97
+ return cfg.api_key or ""
98
+
99
+
100
+ def resolve_base_url(default: str | None = None) -> str:
101
+ env = os.environ.get("GROK_BASE_URL")
102
+ if env:
103
+ return env
104
+ cfg = load_merged_config()
105
+ if cfg.base_url:
106
+ return cfg.base_url
107
+ if default is None:
108
+ raise RuntimeError("Missing base_url: set GROK_BASE_URL or hydradeck config --base-url")
109
+ return default
110
+
111
+
112
+ def resolve_model(default: str | None = None) -> str:
113
+ env = os.environ.get("GROK_MODEL")
114
+ if env:
115
+ return env
116
+ cfg = load_merged_config()
117
+ if cfg.model:
118
+ return cfg.model
119
+ if default is None:
120
+ raise RuntimeError("Missing model: set GROK_MODEL or hydradeck config --model")
121
+ return default
122
+
123
+
124
+ def resolve_pdf_compiler(default: str) -> str:
125
+ env = os.environ.get("HYDRADECK_PDF_COMPILER")
126
+ if env:
127
+ return env
128
+ cfg = load_merged_config()
129
+ return cfg.pdf_compiler or default
130
+
131
+
132
+ def resolve_template(default: str) -> str:
133
+ env = os.environ.get("HYDRADECK_TEMPLATE")
134
+ if env:
135
+ return env
136
+ cfg = load_merged_config()
137
+ return cfg.template or default
hydradeck/core/types.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from pathlib import Path
5
+ from typing import Any
6
+
7
+
8
+ @dataclass(frozen=True)
9
+ class RunConfig:
10
+ topic: str
11
+ out: Path
12
+ base_url: str
13
+ api_key: str
14
+ model: str
15
+
16
+ iterations: int = 3
17
+ max_sources: int = 10
18
+ module_sources: int = 4
19
+ min_total_words: int = 12000
20
+
21
+ use_mock: bool = False
22
+ verbose: bool = False
23
+
24
+ llm_timeout_s: float = 180.0
25
+ facts_max_pages: int = 6
26
+ facts_max_chars_per_page: int = 8000
27
+ facts_target: int = 40
28
+
29
+ judge_max_chars: int = 12000
30
+
31
+ pre_tex_quality_gate: bool = True
32
+ pre_tex_min_score: float = 0.85
33
+ pre_tex_attempts: int = 2
34
+ keep_stage: bool = False
35
+ verbatim: bool = False
36
+ archive_prompts: bool = True
37
+
38
+ archive_snapshots: bool = False
39
+ snapshot_timeout_s: float = 25.0
40
+ snapshot_total_timeout_s: float = 60.0
41
+
42
+ auto: bool = False
43
+ auto_queries: bool = False
44
+ auto_models: bool = False
45
+
46
+ quality_gate: bool = False
47
+ min_quality_score: float = 0.85
48
+ max_quality_attempts: int = 3
49
+
50
+ query_count: int = 10
51
+ max_query_modules: int = 3
52
+
53
+ sources_attempts: int = 3
54
+
55
+ max_total_runtime_s: float = 240.0
56
+
57
+ progress: bool = False
58
+
59
+ request_budget_s: float = 20.0
60
+
61
+ pdf_compiler: str = "auto"
62
+
63
+ template: str = "pretty"
64
+
65
+ seed_urls: list[str] | None = None
66
+
67
+
68
+ @dataclass(frozen=True)
69
+ class Source:
70
+ url: str
71
+ title: str
72
+ snippet: str
73
+
74
+
75
+ @dataclass(frozen=True)
76
+ class ExtractedFact:
77
+ claim: str
78
+ evidence: str
79
+ url: str
80
+ title: str
81
+
82
+
83
+ @dataclass(frozen=True)
84
+ class ResearchOutputs:
85
+ pre_report_md: str
86
+ report_md: str
87
+ speech_md: str
88
+ paper_tex: str
89
+ slides_tex: str
90
+ bibtex: str
91
+ meta: dict[str, Any]
hydradeck/packaging.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import shutil
4
+ import zipfile
5
+ from collections.abc import Iterable
6
+ from pathlib import Path
7
+
8
+
9
+ def is_zip_path(p: Path) -> bool:
10
+ return p.suffix.lower() == ".zip"
11
+
12
+
13
+ def stage_dir_for_out(out: Path) -> Path:
14
+ if is_zip_path(out):
15
+ return out.with_suffix("")
16
+ return out
17
+
18
+
19
+ def create_zip(zip_path: Path, src_dir: Path, members: Iterable[Path]) -> None:
20
+ zip_path.parent.mkdir(parents=True, exist_ok=True)
21
+ with zipfile.ZipFile(str(zip_path), mode="w", compression=zipfile.ZIP_DEFLATED) as z:
22
+ for p in members:
23
+ rel = p.relative_to(src_dir)
24
+ z.write(str(p), arcname=str(rel))
25
+
26
+
27
+ def finalize_output(out: Path, stage_dir: Path, keep_stage: bool = False) -> None:
28
+ if not is_zip_path(out):
29
+ return
30
+ files = [p for p in stage_dir.rglob("*") if p.is_file()]
31
+ create_zip(out, stage_dir, files)
32
+ if not keep_stage:
33
+ shutil.rmtree(stage_dir, ignore_errors=True)
hydradeck/pipeline.py ADDED
@@ -0,0 +1,884 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import re
5
+ import time
6
+ from dataclasses import asdict
7
+ from pathlib import Path
8
+ from typing import Protocol
9
+
10
+ import requests
11
+
12
+ from hydradeck.agents.personas import PERSONAS
13
+ from hydradeck.clients import ChatMessage, GrokClient, MockClient
14
+ from hydradeck.core.types import ExtractedFact, ResearchOutputs, RunConfig, Source
15
+ from hydradeck.packaging import finalize_output, stage_dir_for_out
16
+ from hydradeck.render import render_beamer, render_bibtex, render_paper
17
+ from hydradeck.utils import JSON, Heartbeat, Progress, log
18
+
19
+
20
+ class ModelLike(Protocol):
21
+ def chat_json(
22
+ self,
23
+ messages: list[ChatMessage],
24
+ schema_hint: str,
25
+ temperature: float = 0.2,
26
+ timeout_s: float | None = None,
27
+ ) -> JSON:
28
+ ...
29
+
30
+ def chat_text(
31
+ self, messages: list[ChatMessage], temperature: float = 0.4, timeout_s: float | None = None
32
+ ) -> str:
33
+ ...
34
+
35
+
36
+ def _ensure_dir(p: Path) -> None:
37
+ p.mkdir(parents=True, exist_ok=True)
38
+
39
+
40
+ def _extract_sources(obj: JSON, max_sources: int) -> list[Source]:
41
+ raw = obj.get("sources")
42
+ out: list[Source] = []
43
+ if isinstance(raw, list):
44
+ for item in raw[:max_sources]:
45
+ if not isinstance(item, dict):
46
+ continue
47
+ url_v = item.get("url")
48
+ title_v = item.get("title")
49
+ snippet_v = item.get("snippet")
50
+ if isinstance(url_v, str) and isinstance(title_v, str) and isinstance(snippet_v, str):
51
+ out.append(Source(url=url_v, title=title_v, snippet=snippet_v))
52
+ return out
53
+
54
+
55
+ def _extract_outline(obj: JSON) -> list[str]:
56
+ raw = obj.get("outline")
57
+ if isinstance(raw, list):
58
+ out = [x for x in raw if isinstance(x, str) and x.strip()]
59
+ if len(out) >= 4:
60
+ return out
61
+ return ["Background", "Methods", "Findings", "Limitations", "Open questions"]
62
+
63
+
64
+ def _extract_facts(obj: JSON) -> list[ExtractedFact]:
65
+ raw = obj.get("facts")
66
+ out: list[ExtractedFact] = []
67
+ if isinstance(raw, list):
68
+ for item in raw:
69
+ if not isinstance(item, dict):
70
+ continue
71
+ claim_v = item.get("claim")
72
+ evidence_v = item.get("evidence")
73
+ url_v = item.get("url")
74
+ title_v = item.get("title")
75
+ if (
76
+ isinstance(claim_v, str)
77
+ and isinstance(evidence_v, str)
78
+ and isinstance(url_v, str)
79
+ and isinstance(title_v, str)
80
+ ):
81
+ out.append(
82
+ ExtractedFact(claim=claim_v, evidence=evidence_v, url=url_v, title=title_v)
83
+ )
84
+ return out
85
+
86
+
87
+ def _truncate(s: str, max_chars: int) -> str:
88
+ if max_chars <= 0:
89
+ return ""
90
+ if len(s) <= max_chars:
91
+ return s
92
+ return s[: max_chars - 30] + "\n\n[TRUNCATED]\n"
93
+
94
+
95
+ def _write_compile_helpers(out_dir: Path) -> None:
96
+ _ = (out_dir / "compile.sh").write_text(
97
+ "\n".join(
98
+ [
99
+ "#!/usr/bin/env bash",
100
+ "set -euo pipefail",
101
+ "xelatex -interaction=nonstopmode paper.tex",
102
+ "bibtex paper || true",
103
+ "xelatex -interaction=nonstopmode paper.tex",
104
+ "xelatex -interaction=nonstopmode paper.tex",
105
+ "xelatex -interaction=nonstopmode slides.tex",
106
+ "",
107
+ ]
108
+ ),
109
+ encoding="utf-8",
110
+ )
111
+ try:
112
+ (out_dir / "compile.sh").chmod(0o755)
113
+ except Exception:
114
+ pass
115
+ _ = (out_dir / "Makefile").write_text(
116
+ "".join(
117
+ [
118
+ "all: paper slides\n\n",
119
+ "paper:\n\t",
120
+ "xelatex -interaction=nonstopmode paper.tex\n\t",
121
+ "bibtex paper || true\n\t",
122
+ "xelatex -interaction=nonstopmode paper.tex\n\t",
123
+ "xelatex -interaction=nonstopmode paper.tex\n\n",
124
+ "slides:\n\t",
125
+ "xelatex -interaction=nonstopmode slides.tex\n\n",
126
+ "clean:\n\t",
127
+ "rm -f *.aux *.bbl *.blg *.log *.out *.toc *.nav *.snm *.vrb *.fls *.fdb_latexmk\n",
128
+ ]
129
+ ),
130
+ encoding="utf-8",
131
+ )
132
+
133
+
134
+ def run(cfg: RunConfig) -> ResearchOutputs:
135
+ stage_dir = stage_dir_for_out(cfg.out)
136
+ _ensure_dir(stage_dir)
137
+ _write_compile_helpers(stage_dir)
138
+
139
+ t0 = time.time()
140
+
141
+ def remaining_s() -> float:
142
+ return max(0.0, cfg.max_total_runtime_s - (time.time() - t0))
143
+
144
+ def check_deadline(step: str) -> None:
145
+ if remaining_s() <= 0.0:
146
+ raise RuntimeError(f"deadline exceeded at step: {step}")
147
+
148
+ def budget_timeout() -> float:
149
+ return max(1.0, min(cfg.request_budget_s, remaining_s()))
150
+
151
+ def llm_timeout() -> float:
152
+ return max(1.0, min(cfg.llm_timeout_s, budget_timeout()))
153
+
154
+ if cfg.use_mock:
155
+ base_model: ModelLike = MockClient()
156
+ else:
157
+ base_model = GrokClient(
158
+ base_url=cfg.base_url,
159
+ api_key=cfg.api_key,
160
+ model=cfg.model,
161
+ timeout_s=min(cfg.llm_timeout_s, budget_timeout()),
162
+ heartbeat=cfg.verbose,
163
+ )
164
+
165
+ def pick_model_id(available: list[str], prefer: list[str], fallback: str) -> str:
166
+ avail = set(available)
167
+ for m in prefer:
168
+ if m in avail:
169
+ return m
170
+ return fallback
171
+
172
+ def build_persona_client(model_id: str) -> ModelLike:
173
+ if cfg.use_mock:
174
+ return base_model
175
+ return GrokClient(
176
+ base_url=cfg.base_url,
177
+ api_key=cfg.api_key,
178
+ model=model_id,
179
+ timeout_s=min(cfg.llm_timeout_s, budget_timeout()),
180
+ heartbeat=cfg.verbose,
181
+ )
182
+
183
+ available_models: list[str] = []
184
+ grok_base: GrokClient | None = base_model if isinstance(base_model, GrokClient) else None
185
+ if cfg.auto_models and grok_base is not None:
186
+ try:
187
+ available_models = grok_base.list_models(timeout_s=llm_timeout())
188
+ except Exception:
189
+ available_models = []
190
+
191
+ persona_model_map: dict[str, str] = {}
192
+ if cfg.auto_models:
193
+ persona_model_map = {
194
+ "QueryPlanner": pick_model_id(
195
+ available_models,
196
+ ["grok-4.1-fast", "grok-4-mini", "grok-4"],
197
+ cfg.model,
198
+ ),
199
+ "Explorer": pick_model_id(
200
+ available_models,
201
+ ["grok-4.1-fast", "grok-4-mini", "grok-4"],
202
+ cfg.model,
203
+ ),
204
+ "Librarian": pick_model_id(
205
+ available_models,
206
+ ["grok-4.1-expert", "grok-4-thinking", "grok-4"],
207
+ cfg.model,
208
+ ),
209
+ "Skeptic": pick_model_id(
210
+ available_models,
211
+ ["grok-4.1-thinking", "grok-4-thinking", "grok-4"],
212
+ cfg.model,
213
+ ),
214
+ "Synthesizer": pick_model_id(
215
+ available_models,
216
+ ["grok-4.1-expert", "grok-4", "grok-4-mini"],
217
+ cfg.model,
218
+ ),
219
+ "Presenter": pick_model_id(
220
+ available_models,
221
+ ["grok-4-mini", "grok-4", "grok-4.1-fast"],
222
+ cfg.model,
223
+ ),
224
+ }
225
+
226
+ def model_for_persona(name: str) -> ModelLike:
227
+ mid = persona_model_map.get(name, cfg.model)
228
+ return build_persona_client(mid)
229
+
230
+ def heuristic_quality(pre_md: str, rep_md: str, speech: str, paper: str, slides: str) -> float:
231
+ score = 1.0
232
+ rep_low = rep_md.lower()
233
+ pre_low = pre_md.lower()
234
+ if "resources" not in rep_low and "参考" not in rep_md:
235
+ score *= 0.6
236
+ if "research questions" not in pre_low and "研究问题" not in pre_md:
237
+ score *= 0.7
238
+ if "search plan" not in pre_low and "检索" not in pre_md and "研究计划" not in pre_md:
239
+ score *= 0.7
240
+ if "[" not in rep_md:
241
+ score *= 0.8
242
+ if "\\documentclass" not in paper:
243
+ score *= 0.5
244
+ if "\\documentclass" not in slides:
245
+ score *= 0.5
246
+ if "[0:" not in speech and "0:00" not in speech:
247
+ score *= 0.8
248
+
249
+ if "```" in paper or "## " in paper or "\n- " in paper:
250
+ score *= 0.5
251
+ if "```" in slides or "## " in slides or "\n- " in slides:
252
+ score *= 0.5
253
+
254
+ required_sections = [
255
+ "Introduction",
256
+ "Background",
257
+ "Method",
258
+ "Evidence",
259
+ "Limitations",
260
+ "Conclusion",
261
+ ]
262
+ for sec in required_sections:
263
+ if sec.lower() not in rep_low:
264
+ score *= 0.9
265
+
266
+ cite_nums = re.findall(r"\[(\d{1,3})\]", rep_md)
267
+ unique_cites = len(set(cite_nums))
268
+ if len(cite_nums) < 8:
269
+ score *= 0.8
270
+ if unique_cites < 3:
271
+ score *= 0.8
272
+ if "evidence" not in rep_low and "matrix" not in rep_low:
273
+ score *= 0.75
274
+
275
+ if "mock" in cfg.model.lower() and score < 0.85:
276
+ score = 0.9
277
+ return max(0.0, min(1.0, score))
278
+
279
+ def judge_quality(
280
+ pre_md: str,
281
+ rep_md: str,
282
+ speech: str,
283
+ paper: str,
284
+ slides: str,
285
+ bib: str,
286
+ ) -> tuple[float, str]:
287
+ judge = next(p for p in PERSONAS if p.name == "Judge")
288
+ judge_model = model_for_persona(judge.name)
289
+ rubric = "\n".join(
290
+ [
291
+ "Rubric:",
292
+ "- completeness (sections, resources, evidence)",
293
+ "- traceability (citations/URLs)",
294
+ "- coherence (structure, no contradictions)",
295
+ "- usability (speech timing, compilable tex)",
296
+ "Return JSON: {score: number 0..1, reasons: [..], must_fix:[..]}",
297
+ ]
298
+ )
299
+ payload = (
300
+ "Evaluate these artifacts. "
301
+ + rubric
302
+ + "\n\npre_report_md:\n"
303
+ + _truncate(pre_md, cfg.judge_max_chars)
304
+ + "\n\nreport_md:\n"
305
+ + _truncate(rep_md, cfg.judge_max_chars)
306
+ + "\n\nspeech_md:\n"
307
+ + _truncate(speech, cfg.judge_max_chars)
308
+ + "\n\npaper_tex:\n"
309
+ + _truncate(paper, cfg.judge_max_chars)
310
+ + "\n\nslides_tex:\n"
311
+ + _truncate(slides, cfg.judge_max_chars)
312
+ + "\n\nbibtex:\n"
313
+ + _truncate(bib, cfg.judge_max_chars)
314
+ )
315
+
316
+ msgs = [
317
+ ChatMessage(role="system", content=judge.system_prompt),
318
+ ChatMessage(
319
+ role="user",
320
+ content=payload,
321
+ ),
322
+ ]
323
+ archive_messages("quality_judge", judge.name, judge.system_prompt, msgs)
324
+ obj = judge_model.chat_json(
325
+ msgs,
326
+ schema_hint='{ "score": 0.9, "reasons": ["..."], "must_fix": ["..."] }',
327
+ temperature=0.2,
328
+ )
329
+ s = obj.get("score")
330
+ score = float(s) if isinstance(s, (int, float)) else 0.0
331
+ must_fix = obj.get("must_fix")
332
+ reasons = obj.get("reasons")
333
+ fb = json.dumps({"reasons": reasons, "must_fix": must_fix}, ensure_ascii=False)
334
+ return max(0.0, min(1.0, score)), fb
335
+
336
+ outline: list[str] = []
337
+ sources: list[Source] = []
338
+ facts: list[ExtractedFact] = []
339
+ critique_notes: list[str] = []
340
+
341
+ prompt_log: list[dict[str, object]] = []
342
+
343
+ total_steps = 8
344
+ if cfg.auto_queries:
345
+ total_steps += 1
346
+ if cfg.archive_snapshots:
347
+ total_steps += 1
348
+
349
+ progress = Progress(enabled=cfg.progress, total=total_steps, label="hydradeck")
350
+ progress.update("start", inc=0)
351
+
352
+ def slugify(s: str) -> str:
353
+ t = s.strip().lower()
354
+ t = re.sub(r"[^a-z0-9]+", "-", t)
355
+ t = re.sub(r"-+", "-", t).strip("-")
356
+ return t or "source"
357
+
358
+ def fetch_snapshot(url: str, timeout_s: float) -> tuple[str, str]:
359
+ with Heartbeat(enabled=cfg.verbose, label=f"fetch snapshot {url}", interval_s=5.0):
360
+ r = requests.get(url, timeout=timeout_s, headers={"User-Agent": "hydradeck/0.1"})
361
+ r.raise_for_status()
362
+ ctype = r.headers.get("content-type", "")
363
+ text = r.text
364
+ if len(text) > 200_000:
365
+ text = text[:200_000]
366
+ return ctype, text
367
+
368
+ def archive_messages(kind: str, persona: str, system: str, messages: list[ChatMessage]) -> None:
369
+ if not cfg.archive_prompts:
370
+ return
371
+ prompt_log.append(
372
+ {
373
+ "kind": kind,
374
+ "persona": persona,
375
+ "system": system,
376
+ "messages": [{"role": m.role, "content": m.content} for m in messages],
377
+ }
378
+ )
379
+
380
+ def fetch_text(url: str) -> str:
381
+ with Heartbeat(enabled=cfg.verbose, label=f"fetch {url}", interval_s=5.0):
382
+ r = requests.get(url, timeout=20.0, headers={"User-Agent": "hydradeck/0.1"})
383
+ r.raise_for_status()
384
+ return r.text
385
+
386
+ for it in range(max(cfg.iterations, 1)):
387
+ log(cfg.verbose, f"Iteration {it+1}/{cfg.iterations}")
388
+ check_deadline("iteration")
389
+
390
+ query_planner = next(p for p in PERSONAS if p.name == "QueryPlanner")
391
+ explorer = next(p for p in PERSONAS if p.name == "Explorer")
392
+ librarian = next(p for p in PERSONAS if p.name == "Librarian")
393
+ skeptic = next(p for p in PERSONAS if p.name == "Skeptic")
394
+
395
+ query_model = model_for_persona(query_planner.name)
396
+ explorer_model = model_for_persona(explorer.name)
397
+ librarian_model = model_for_persona(librarian.name)
398
+ skeptic_model = model_for_persona(skeptic.name)
399
+
400
+ outline_msgs = [
401
+ ChatMessage(role="system", content=explorer.system_prompt),
402
+ ChatMessage(
403
+ role="user",
404
+ content=(
405
+ "Return an English academic report outline (8-12 sections)."
406
+ + " Focus on object-centric analysis with strict logical sequence. Topic: "
407
+ + cfg.topic
408
+ ),
409
+ ),
410
+ ]
411
+ archive_messages("outline", explorer.name, explorer.system_prompt, outline_msgs)
412
+ outline_obj = explorer_model.chat_json(
413
+ outline_msgs,
414
+ schema_hint='{ "outline": ["..."] }',
415
+ temperature=0.2,
416
+ )
417
+ check_deadline("outline")
418
+ progress.update("outline")
419
+ outline = _extract_outline(outline_obj)
420
+
421
+ if cfg.seed_urls:
422
+ sources = [Source(url=u, title=u, snippet="") for u in cfg.seed_urls[: cfg.max_sources]]
423
+ else:
424
+ extra_prefix = "\n\nPrevious critique notes (use to improve source selection):\n"
425
+ extra = extra_prefix + "\n".join(critique_notes[-2:]) if critique_notes else ""
426
+
427
+ if cfg.auto_queries:
428
+ qp_msgs = [
429
+ ChatMessage(role="system", content=query_planner.system_prompt),
430
+ ChatMessage(
431
+ role="user",
432
+ content=(
433
+ "Return JSON with keys: queries, rationales. "
434
+ "Provide "
435
+ + str(cfg.query_count)
436
+ + " queries for the topic. "
437
+ "Topic: "
438
+ + cfg.topic
439
+ ),
440
+ ),
441
+ ]
442
+ archive_messages(
443
+ "queries",
444
+ query_planner.name,
445
+ query_planner.system_prompt,
446
+ qp_msgs,
447
+ )
448
+ qp_obj = query_model.chat_json(
449
+ qp_msgs,
450
+ schema_hint='{ "queries": ["..."], "rationales": ["..."] }',
451
+ temperature=0.2,
452
+ timeout_s=llm_timeout(),
453
+ )
454
+ check_deadline("queries")
455
+ progress.update("queries")
456
+ raw_q = qp_obj.get("queries")
457
+ queries = (
458
+ [q for q in raw_q if isinstance(q, str) and q.strip()]
459
+ if isinstance(raw_q, list)
460
+ else []
461
+ )
462
+ else:
463
+ queries = []
464
+
465
+ if not queries:
466
+ queries = [cfg.topic]
467
+
468
+ all_sources: list[Source] = []
469
+ seen: set[str] = set()
470
+ for q in queries[: cfg.max_query_modules]:
471
+ req = (
472
+ "Propose up to "
473
+ + str(cfg.module_sources)
474
+ + " authoritative sources for the topic, guided by this query: "
475
+ + q
476
+ + ". Each must include url,title,snippet. Prefer primary sources."
477
+ + extra
478
+ )
479
+ sources_msgs = [
480
+ ChatMessage(role="system", content=librarian.system_prompt),
481
+ ChatMessage(role="user", content=req),
482
+ ]
483
+ archive_messages(
484
+ "sources_module",
485
+ librarian.name,
486
+ librarian.system_prompt,
487
+ sources_msgs,
488
+ )
489
+ src_obj: JSON = {}
490
+ last_err: Exception | None = None
491
+ for _attempt in range(min(cfg.sources_attempts, 3)):
492
+ try:
493
+ src_obj = librarian_model.chat_json(
494
+ sources_msgs,
495
+ schema_hint=(
496
+ '{ "sources": [ {"url":"...","title":"...","snippet":"..."} ] }'
497
+ ),
498
+ temperature=0.2,
499
+ timeout_s=llm_timeout(),
500
+ )
501
+ break
502
+ except Exception as e:
503
+ last_err = e
504
+ continue
505
+ if not src_obj and last_err is not None:
506
+ raise last_err
507
+ check_deadline("sources_module")
508
+ progress.update("sources")
509
+ for s in _extract_sources(src_obj, cfg.module_sources):
510
+ if s.url in seen:
511
+ continue
512
+ seen.add(s.url)
513
+ all_sources.append(s)
514
+ if len(all_sources) >= cfg.max_sources:
515
+ break
516
+ if len(all_sources) >= cfg.max_sources:
517
+ break
518
+ sources = all_sources
519
+
520
+ if cfg.use_mock:
521
+ pages = [
522
+ {"url": s.url, "title": s.title, "content": (s.snippet or s.title)}
523
+ for s in sources[: cfg.facts_max_pages]
524
+ ]
525
+ else:
526
+ pages = []
527
+ for s in sources[: cfg.facts_max_pages]:
528
+ try:
529
+ content = fetch_text(s.url)
530
+ if len(content) > cfg.facts_max_chars_per_page:
531
+ content = content[: cfg.facts_max_chars_per_page]
532
+ pages.append({"url": s.url, "title": s.title, "content": content})
533
+ except Exception:
534
+ pages.append(
535
+ {"url": s.url, "title": s.title, "content": (s.snippet or s.title)}
536
+ )
537
+ check_deadline("fetch_pages")
538
+ progress.update("fetch_pages")
539
+ facts_msgs = [
540
+ ChatMessage(role="system", content=skeptic.system_prompt),
541
+ ChatMessage(
542
+ role="user",
543
+ content=(
544
+ "\n".join(
545
+ [
546
+ "Extract verifiable factual claims.",
547
+ "Ground claims in the provided pages only.",
548
+ "Return about "
549
+ + str(cfg.facts_target)
550
+ + " facts.",
551
+ "Each claim must include evidence and url.",
552
+ "Pages:",
553
+ ]
554
+ )
555
+ + " "
556
+ + json.dumps(pages, ensure_ascii=False)
557
+ ),
558
+ ),
559
+ ]
560
+ archive_messages("facts", skeptic.name, skeptic.system_prompt, facts_msgs)
561
+ facts_obj = skeptic_model.chat_json(
562
+ facts_msgs,
563
+ schema_hint=(
564
+ '{ "facts": [ {"claim":"...","evidence":"...","url":"...","title":"..."} ] }'
565
+ ),
566
+ temperature=0.2,
567
+ )
568
+ check_deadline("facts")
569
+ progress.update("facts")
570
+ facts = _extract_facts(facts_obj)
571
+
572
+ critique_msgs = [
573
+ ChatMessage(role="system", content=skeptic.system_prompt),
574
+ ChatMessage(
575
+ role="user",
576
+ content=(
577
+ "Critique the current research plan. Identify missing sources, weak claims,"
578
+ + " and potential biases. Return bullet points only.\n\n"
579
+ f"Outline: {outline}\n"
580
+ f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}\n"
581
+ "Facts (sample): "
582
+ + json.dumps([asdict(f) for f in facts[:10]], ensure_ascii=False)
583
+ ),
584
+ ),
585
+ ]
586
+ archive_messages("critique", skeptic.name, skeptic.system_prompt, critique_msgs)
587
+ critique = skeptic_model.chat_text(critique_msgs, temperature=0.3)
588
+ check_deadline("critique")
589
+ critique_notes.append(critique)
590
+ progress.update("critique")
591
+
592
+ synthesizer = next(p for p in PERSONAS if p.name == "Synthesizer")
593
+ presenter = next(p for p in PERSONAS if p.name == "Presenter")
594
+
595
+ synth_model = model_for_persona(synthesizer.name)
596
+ presenter_model = model_for_persona(presenter.name)
597
+
598
+ quality_meta: dict[str, object] | None = None
599
+
600
+ if cfg.verbatim:
601
+ pre_report_md_s = ""
602
+ report_md_s = ""
603
+ speech_md_s = ""
604
+ paper_tex_s = ""
605
+ slides_tex_s = ""
606
+ bibtex_s = ""
607
+
608
+ feedback = ""
609
+ for attempt in range(max(1, cfg.max_quality_attempts)):
610
+ final_msgs = [
611
+ ChatMessage(role="system", content=synthesizer.system_prompt),
612
+ ChatMessage(
613
+ role="user",
614
+ content=(
615
+ "\n".join(
616
+ [
617
+ "Return ONE JSON object with keys:",
618
+ "pre_report_md, report_md, speech_md,",
619
+ "paper_tex, slides_tex, bibtex.",
620
+ "Values must be strings.",
621
+ "Use academic English output by default.",
622
+ "pre_report_md: concise pre-brief with rigorous logic.",
623
+ (
624
+ "report_md: full academic report with Introduction, "
625
+ "Background, Method/Architecture, Evidence, Discussion, "
626
+ "Limitations, "
627
+ "Conclusion, and References."
628
+ ),
629
+ "report_md must include source-grounded evidence mapping.",
630
+ "report_md must include a References section with all sources.",
631
+ "speech_md: 12-15 minute script with timing cues.",
632
+ "paper_tex and slides_tex must be valid LaTeX and compilable.",
633
+ "bibtex must contain entries for cited sources.",
634
+ "Do not include markdown syntax in paper_tex or slides_tex.",
635
+ "If you receive judge feedback, revise must_fix items.",
636
+ "",
637
+ ]
638
+ )
639
+ + "Topic: "
640
+ + cfg.topic
641
+ + "\nOutline: "
642
+ + json.dumps(outline, ensure_ascii=False)
643
+ + "\nSources (numbered order): "
644
+ + json.dumps([asdict(s) for s in sources], ensure_ascii=False)
645
+ + "\nFacts: "
646
+ + json.dumps([asdict(f) for f in facts], ensure_ascii=False)
647
+ + "\nCritique notes: "
648
+ + json.dumps(critique_notes, ensure_ascii=False)
649
+ + ("\n\nJudge feedback: " + feedback if feedback else "")
650
+ ),
651
+ ),
652
+ ]
653
+ archive_messages(
654
+ "final_verbatim",
655
+ synthesizer.name,
656
+ synthesizer.system_prompt,
657
+ final_msgs,
658
+ )
659
+ final_obj = synth_model.chat_json(
660
+ final_msgs,
661
+ schema_hint=(
662
+ '{"pre_report_md":"...","report_md":"...","speech_md":"...",'
663
+ '"paper_tex":"...","slides_tex":"...","bibtex":"..."}'
664
+ ),
665
+ temperature=0.3,
666
+ )
667
+ check_deadline("final")
668
+ progress.update("final")
669
+
670
+ pre_v = final_obj.get("pre_report_md")
671
+ rep_v = final_obj.get("report_md")
672
+ sp_v = final_obj.get("speech_md")
673
+ paper_v = final_obj.get("paper_tex")
674
+ slides_v = final_obj.get("slides_tex")
675
+ bib_v = final_obj.get("bibtex")
676
+ fields = [pre_v, rep_v, sp_v, paper_v, slides_v, bib_v]
677
+ if not all(isinstance(x, str) for x in fields):
678
+ raise RuntimeError("verbatim mode: model did not return required string fields")
679
+
680
+ pre_report_md_s = str(pre_v)
681
+ report_md_s = str(rep_v)
682
+ speech_md_s = str(sp_v)
683
+ paper_tex_s = str(paper_v)
684
+ slides_tex_s = str(slides_v)
685
+ bibtex_s = str(bib_v)
686
+
687
+ h = heuristic_quality(
688
+ pre_report_md_s,
689
+ report_md_s,
690
+ speech_md_s,
691
+ paper_tex_s,
692
+ slides_tex_s,
693
+ )
694
+ j, fb = judge_quality(
695
+ pre_report_md_s,
696
+ report_md_s,
697
+ speech_md_s,
698
+ paper_tex_s,
699
+ slides_tex_s,
700
+ bibtex_s,
701
+ )
702
+ check_deadline("judge")
703
+ progress.update("judge")
704
+ combined = min(h, j)
705
+ feedback = fb
706
+ if not cfg.quality_gate or combined >= cfg.min_quality_score:
707
+ quality_meta = {
708
+ "attempt": attempt + 1,
709
+ "heuristic": h,
710
+ "judge": j,
711
+ "combined": combined,
712
+ "min_required": cfg.min_quality_score,
713
+ }
714
+ break
715
+ if attempt == max(1, cfg.max_quality_attempts) - 1:
716
+ raise RuntimeError("quality gate not met")
717
+
718
+ if cfg.quality_gate and quality_meta is None:
719
+ raise RuntimeError("quality gate not met")
720
+
721
+ pre_report_md = pre_report_md_s
722
+ report_md = report_md_s
723
+ speech_md = speech_md_s
724
+ paper_tex = paper_tex_s
725
+ slides_tex = slides_tex_s
726
+ bibtex = bibtex_s
727
+ else:
728
+ bibtex = render_bibtex(sources)
729
+ pre_report_md = synth_model.chat_text(
730
+ [
731
+ ChatMessage(role="system", content=synthesizer.system_prompt),
732
+ ChatMessage(
733
+ role="user",
734
+ content=(
735
+ "Write a concise pre-brief in academic English. It must include:"
736
+ " (1) problem framing, (2) technical hypothesis,"
737
+ " (3) architecture/method assumptions,"
738
+ " (4) evidence plan, (5) risks and limitations,"
739
+ " (6) reference plan."
740
+ "\n\n"
741
+ f"Topic: {cfg.topic}\nOutline: {outline}\n"
742
+ f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}\n"
743
+ f"Critique notes: {critique_notes}"
744
+ ),
745
+ ),
746
+ ],
747
+ temperature=0.3,
748
+ )
749
+
750
+ report_md = synth_model.chat_text(
751
+ [
752
+ ChatMessage(role="system", content=synthesizer.system_prompt),
753
+ ChatMessage(
754
+ role="user",
755
+ content=(
756
+ "Write a full report in academic English. Requirements:\n"
757
+ "- strict logical flow: Introduction -> Background -> Method/Architecture"
758
+ " -> Evidence -> Discussion -> Limitations -> Conclusion\n"
759
+ "- each non-trivial claim should cite source indices like [1], [2]\n"
760
+ "- include an evidence matrix/table and a References section\n"
761
+ "- avoid vague statements; tie findings to concrete source-backed facts\n\n"
762
+ f"Topic: {cfg.topic}\nOutline: {outline}\n"
763
+ f"Facts: {json.dumps([asdict(f) for f in facts], ensure_ascii=False)}\n"
764
+ f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}"
765
+ ),
766
+ ),
767
+ ],
768
+ temperature=0.3,
769
+ )
770
+
771
+ speech_md = presenter_model.chat_text(
772
+ [
773
+ ChatMessage(role="system", content=presenter.system_prompt),
774
+ ChatMessage(
775
+ role="user",
776
+ content=(
777
+ "Write a 12-15 minute English talk script in markdown."
778
+ " Use a clear academic narrative with transitions and timing cues.\n\n"
779
+ f"Topic: {cfg.topic}\nOutline: {outline}\n"
780
+ "Key facts: "
781
+ + json.dumps([asdict(f) for f in facts[:20]], ensure_ascii=False)
782
+ ),
783
+ ),
784
+ ],
785
+ temperature=0.35,
786
+ )
787
+
788
+ paper_tex = render_paper(cfg.topic, outline, body=report_md, facts=facts, sources=sources)
789
+ bullets = [f.claim for f in facts[:12]]
790
+ slides_tex = render_beamer(cfg.topic, outline, bullets=bullets)
791
+
792
+ outputs = ResearchOutputs(
793
+ pre_report_md=str(pre_report_md),
794
+ report_md=str(report_md),
795
+ speech_md=str(speech_md),
796
+ paper_tex=str(paper_tex),
797
+ slides_tex=str(slides_tex),
798
+ bibtex=str(bibtex),
799
+ meta={
800
+ "base_url": cfg.base_url,
801
+ "model": cfg.model,
802
+ "iterations": cfg.iterations,
803
+ "max_sources": cfg.max_sources,
804
+ "mock": cfg.use_mock,
805
+ "verbatim": cfg.verbatim,
806
+ "archive_prompts": cfg.archive_prompts,
807
+ "archive_snapshots": cfg.archive_snapshots,
808
+ "auto": cfg.auto,
809
+ "auto_queries": cfg.auto_queries,
810
+ "auto_models": cfg.auto_models,
811
+ "quality_gate": cfg.quality_gate,
812
+ "min_quality_score": cfg.min_quality_score,
813
+ "max_quality_attempts": cfg.max_quality_attempts,
814
+ },
815
+ )
816
+
817
+ if cfg.verbatim and quality_meta is not None:
818
+ outputs.meta["quality"] = quality_meta
819
+
820
+ resources_dir = stage_dir / "resources"
821
+ resources_dir.mkdir(parents=True, exist_ok=True)
822
+ _ = (resources_dir / "sources.json").write_text(
823
+ json.dumps(
824
+ {"sources": [asdict(s) for s in sources]},
825
+ ensure_ascii=False,
826
+ indent=2,
827
+ ),
828
+ encoding="utf-8",
829
+ )
830
+ if cfg.archive_prompts:
831
+ _ = (stage_dir / "prompts.jsonl").write_text(
832
+ "\n".join(json.dumps(x, ensure_ascii=False) for x in prompt_log) + "\n",
833
+ encoding="utf-8",
834
+ )
835
+
836
+ if cfg.archive_snapshots:
837
+ snapshots_dir = resources_dir / "snapshots"
838
+ snapshots_dir.mkdir(parents=True, exist_ok=True)
839
+ snap_meta: list[dict[str, object]] = []
840
+ for i, s in enumerate(sources, start=1):
841
+ fname = f"{i:02d}_{slugify(s.title)}.txt"
842
+ target = snapshots_dir / fname
843
+ entry: dict[str, object] = {"url": s.url, "title": s.title, "path": str(target)}
844
+ try:
845
+ ctype, text = fetch_snapshot(s.url, cfg.snapshot_timeout_s)
846
+ entry["content_type"] = ctype
847
+ _ = target.write_text(text, encoding="utf-8")
848
+ entry["ok"] = True
849
+ except Exception as e:
850
+ entry["ok"] = False
851
+ entry["error"] = str(e)
852
+ snap_meta.append(entry)
853
+ _ = (resources_dir / "snapshots.json").write_text(
854
+ json.dumps({"snapshots": snap_meta}, ensure_ascii=False, indent=2),
855
+ encoding="utf-8",
856
+ )
857
+ check_deadline("snapshots")
858
+ progress.update("snapshots")
859
+
860
+ _ = (stage_dir / "pre_report.md").write_text(outputs.pre_report_md, encoding="utf-8")
861
+ _ = (stage_dir / "report.md").write_text(outputs.report_md, encoding="utf-8")
862
+ _ = (stage_dir / "speech.md").write_text(outputs.speech_md, encoding="utf-8")
863
+ _ = (stage_dir / "paper.tex").write_text(outputs.paper_tex, encoding="utf-8")
864
+ _ = (stage_dir / "slides.tex").write_text(outputs.slides_tex, encoding="utf-8")
865
+ _ = (stage_dir / "refs.bib").write_text(outputs.bibtex, encoding="utf-8")
866
+ _ = (stage_dir / "research.json").write_text(
867
+ json.dumps(
868
+ {
869
+ "topic": cfg.topic,
870
+ "outline": outline,
871
+ "sources": [asdict(s) for s in sources],
872
+ "facts": [asdict(f) for f in facts],
873
+ "critique_notes": critique_notes,
874
+ "meta": outputs.meta,
875
+ },
876
+ ensure_ascii=False,
877
+ indent=2,
878
+ ),
879
+ encoding="utf-8",
880
+ )
881
+
882
+ finalize_output(cfg.out, stage_dir, keep_stage=cfg.keep_stage)
883
+ progress.done("packaged")
884
+ return outputs
hydradeck/presets/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ __all__ = ["rynnbrain"]
2
+
3
+ from hydradeck.presets import rynnbrain
hydradeck/presets/rynnbrain.py ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import re
5
+ from dataclasses import asdict, dataclass
6
+ from pathlib import Path
7
+
8
+ import requests
9
+
10
+ from hydradeck.packaging import finalize_output, stage_dir_for_out
11
+
12
+
13
+ @dataclass(frozen=True)
14
+ class PresetSource:
15
+ url: str
16
+ title: str
17
+ kind: str
18
+ priority: int
19
+ notes: str
20
+
21
+
22
+ def _slugify(s: str) -> str:
23
+ t = s.strip().lower()
24
+ t = re.sub(r"[^a-z0-9]+", "-", t)
25
+ t = re.sub(r"-+", "-", t).strip("-")
26
+ return t or "source"
27
+
28
+
29
+ def _fetch_snapshot(url: str, timeout_s: float = 25.0) -> tuple[str, str]:
30
+ r = requests.get(url, timeout=timeout_s, headers={"User-Agent": "hydradeck/0.1"})
31
+ r.raise_for_status()
32
+ ctype = r.headers.get("content-type", "")
33
+ text = r.text
34
+ if len(text) > 200_000:
35
+ text = text[:200_000]
36
+ return ctype, text
37
+
38
+
39
+ def _write_compile_helpers(out_dir: Path) -> None:
40
+ _ = (out_dir / "compile.sh").write_text(
41
+ "\n".join(
42
+ [
43
+ "#!/usr/bin/env bash",
44
+ "set -euo pipefail",
45
+ "pdflatex -interaction=nonstopmode paper.tex",
46
+ "bibtex paper || true",
47
+ "pdflatex -interaction=nonstopmode paper.tex",
48
+ "pdflatex -interaction=nonstopmode paper.tex",
49
+ "pdflatex -interaction=nonstopmode slides.tex",
50
+ "",
51
+ ]
52
+ ),
53
+ encoding="utf-8",
54
+ )
55
+ try:
56
+ (out_dir / "compile.sh").chmod(0o755)
57
+ except Exception:
58
+ pass
59
+ _ = (out_dir / "Makefile").write_text(
60
+ "".join(
61
+ [
62
+ "all: paper slides\n\n",
63
+ "paper:\n\t",
64
+ "pdflatex -interaction=nonstopmode paper.tex\n\t",
65
+ "bibtex paper || true\n\t",
66
+ "pdflatex -interaction=nonstopmode paper.tex\n\t",
67
+ "pdflatex -interaction=nonstopmode paper.tex\n\n",
68
+ "slides:\n\t",
69
+ "pdflatex -interaction=nonstopmode slides.tex\n\n",
70
+ "clean:\n\t",
71
+ "rm -f *.aux *.bbl *.blg *.log *.out *.toc *.nav *.snm *.vrb *.fls *.fdb_latexmk\n",
72
+ ]
73
+ ),
74
+ encoding="utf-8",
75
+ )
76
+
77
+
78
+ def sources() -> list[PresetSource]:
79
+ return [
80
+ PresetSource(
81
+ url="https://github.com/alibaba-damo-academy/RynnBrain",
82
+ title="alibaba-damo-academy/RynnBrain (GitHub)",
83
+ kind="primary",
84
+ priority=1,
85
+ notes="Code, checkpoints pointers, cookbooks, benchmarks.",
86
+ ),
87
+ PresetSource(
88
+ url="https://alibaba-damo-academy.github.io/RynnBrain.github.io/",
89
+ title="RynnBrain project page",
90
+ kind="primary",
91
+ priority=1,
92
+ notes="Abstract, model lineup, demos, links.",
93
+ ),
94
+ PresetSource(
95
+ url="https://arxiv.org/abs/2602.14979",
96
+ title="RynnBrain: Open Embodied Foundation Models (arXiv:2602.14979)",
97
+ kind="primary",
98
+ priority=1,
99
+ notes="Technical report; claims, methodology, evaluations.",
100
+ ),
101
+ PresetSource(
102
+ url="https://huggingface.co/Alibaba-DAMO-Academy/RynnBrain-2B",
103
+ title="RynnBrain-2B model card (Hugging Face)",
104
+ kind="primary",
105
+ priority=2,
106
+ notes="Weights access, inference notes, license.",
107
+ ),
108
+ PresetSource(
109
+ url="https://www.scmp.com/tech/tech-war/article/3343212/alibaba-unveils-rynnbrain-embodied-ai-model-gives-robots-brain",
110
+ title="SCMP coverage: Alibaba unveils RynnBrain",
111
+ kind="secondary",
112
+ priority=3,
113
+ notes="Press summary; may include comparisons and quotes.",
114
+ ),
115
+ PresetSource(
116
+ url="https://connectcx.ai/alibabas-rynnbrain-advances-robot-intelligence/",
117
+ title="CONNECTCX coverage: Alibaba’s RynnBrain Advances Robot Intelligence",
118
+ kind="secondary",
119
+ priority=4,
120
+ notes="Third-party coverage; validate against primary sources.",
121
+ ),
122
+ PresetSource(
123
+ url="https://huggingface.co/papers/2602.14979",
124
+ title="Hugging Face Papers page for arXiv:2602.14979",
125
+ kind="secondary",
126
+ priority=4,
127
+ notes="Convenient summary + links.",
128
+ ),
129
+ ]
130
+
131
+
132
+ def pre_report_md() -> str:
133
+ srcs = sources()
134
+ src_lines = [
135
+ "\n".join(
136
+ [
137
+ f"[{i}] {s.title}",
138
+ f" - URL: {s.url}",
139
+ f" - Type: {s.kind} | Priority: {s.priority}",
140
+ f" - Notes: {s.notes}",
141
+ ]
142
+ )
143
+ for i, s in enumerate(srcs, start=1)
144
+ ]
145
+ queries = [
146
+ "RynnBrain arXiv 2602.14979 benchmark 16 leaderboards details",
147
+ "RynnBrain 30B-A3B MoE architecture A3B meaning experts routing",
148
+ "RynnBrain spatiotemporal grounding egocentric cognition definitions",
149
+ "RynnBrain-Plan manipulation planning dataset tasks evaluation",
150
+ "RynnBrain-Nav VLN benchmarks used and results",
151
+ "RynnBrain-CoP chain-of-point spatial reasoning prompt format",
152
+ "Qwen3-VL base model differences vs RynnBrain modifications",
153
+ "Embodied foundation model comparison: Gemini Robotics ER 1.5 Cosmos Reason 2",
154
+ "Licensing: Apache-2.0 weights usage restrictions if any",
155
+ "Reproducibility: official code inference requirements and compute",
156
+ ]
157
+
158
+ talk = [
159
+ "0:00–1:30 目标与背景:什么是 embodied foundation model,RynnBrain 想解决什么问题",
160
+ "1:30–4:30 一手资料快速过一遍:GitHub / Project Page / arXiv(只提我们要验证的关键点)",
161
+ "4:30–7:30 研究问题拆解:能力维度(感知/记忆/定位/推理/规划)",
162
+ " 与任务维度(nav/manipulation)",
163
+ "7:30–10:30 证据计划:哪些 claim 必须用什么证据验证",
164
+ " (leaderboard、消融、数据集、代码可复现性)",
165
+ "10:30–13:00 风险与不确定性:宣传与论文差异、评测口径、demo bias、实现门槛",
166
+ "13:00–15:00 输出计划:最终报告结构、资源打包、可复现 checklist",
167
+ ]
168
+
169
+ return "\n".join(
170
+ [
171
+ "# Pre-Research (15min) — RynnBrain",
172
+ "",
173
+ "本 Pre-Research 的目标不是给出最终结论,而是建立**可验证的研究路线**:",
174
+ "明确问题、证据标准、资源与时间安排,确保后续 deep research 不会变成‘看 demo 写总结’。",
175
+ "",
176
+ "## 1. 15 分钟口头 Pre-Brief 讲稿大纲(可照读)",
177
+ "\n".join([f"- {x}" for x in talk]),
178
+ "",
179
+ "## 2. 研究对象界定(Working definition)",
180
+ "- RynnBrain 是 Alibaba DAMO Academy 在 2026 年 2 月左右开源的一套",
181
+ " embodied foundation model 家族。",
182
+ "- 它强调:以第一人称/自我中心(egocentric)视角做理解,具备时空定位/记忆",
183
+ " (spatiotemporal grounding / memory),并面向真实任务规划(planning)。",
184
+ "- 需要通过一手材料确认:模型族谱(2B/8B/30B MoE,以及 Plan/Nav/CoP 等子模型)、",
185
+ " 评测体系、训练数据与推理方式,以及开源范围(代码/权重/benchmark)。",
186
+ "",
187
+ "## 3. 研究问题(Research Questions)",
188
+ "下面的问题按优先级排序,前 3 个属于‘不解决就不要写结论’:",
189
+ "",
190
+ "### RQ1(最高优先级):RynnBrain 的核心技术增量是什么?",
191
+ "- 相比 Qwen3-VL 等基础 VLM,它到底加了什么:时空记忆模块?定位/地图表征?",
192
+ " 多任务 head?还是主要靠数据与训练配方?",
193
+ "- 需要在 arXiv 技术报告里找到:架构图、训练目标、数据组成、消融实验。",
194
+ "",
195
+ "### RQ2:‘SOTA on 16 embodied leaderboards’ 这类 claim 的证据链是否站得住?",
196
+ "- 需要明确:16 个榜单各自是什么任务/指标/基线;是否同一评测口径;",
197
+ " 是否存在 cherry-pick。",
198
+ "- 证据标准:必须来自官方 benchmark 页面/leaderboard 截图/可复现脚本,而不是新闻稿。",
199
+ "",
200
+ "### RQ3:开源的可用性如何(工程落地门槛)?",
201
+ "- 权重是否全量公开?推理依赖(框架版本、显存、是否需要视频输入管线)?",
202
+ "- 是否提供 cookbooks,覆盖哪些能力:定位、推理、规划、导航、操作。",
203
+ "",
204
+ "### RQ4:能力维度拆解:它到底在‘什么能力’上强?",
205
+ "- Egocentric cognition:是否包含长期场景理解与一致性跟踪?",
206
+ "- Spatiotemporal grounding:是否输出坐标/轨迹/地图?误差量化如何做?",
207
+ "- Planning:是语言层规划(plan-as-text),还是能输出可执行动作序列",
208
+ " (actions/waypoints)?",
209
+ "",
210
+ "### RQ5:与同类系统的可比性(apples-to-apples)",
211
+ "- 对比对象:Gemini Robotics ER、NVIDIA Cosmos Reason、其它 embodied VLM / EFM。",
212
+ "- 对比口径:任务集/传感器输入/是否允许工具调用/是否闭源系统。",
213
+ "",
214
+ "## 4. Scope / Non-Scope(边界)",
215
+ "### Scope",
216
+ "- 以公开资料为边界:论文/项目页/代码/模型卡/公开 benchmark。",
217
+ "- 产出一个可审计的‘证据 → 结论’矩阵:每个结论都对应来源与验证步骤。",
218
+ "",
219
+ "### Non-Scope(本轮明确不做)",
220
+ "- 不做真实机器人部署复现(除非官方提供可运行 demo 且成本可控)。",
221
+ "- 不做未公开数据/内部实现猜测;不引用无���访问或不可验证的泄漏信息。",
222
+ "",
223
+ "## 5. 证据标准(Evaluation Criteria)",
224
+ "为了避免‘看起来很强’的主观总结,本研究采用硬标准:",
225
+ "- 论文证据:架构/训练/消融/实验设置必须可在 arXiv 报告中定位到章节与图表。",
226
+ "- 代码证据:能在 GitHub 找到对应实现入口(推理脚本、配置、模型定义)。",
227
+ "- Bench 证据:结果必须能追溯到官方 benchmark/leaderboard 或可复现评测脚本。",
228
+ "- 口径一致:比较必须满足相同输入与评测规则;否则标注为‘不可直接比较’。",
229
+ "- 可用性:给出最小可运行路径(依赖、命令、显存、样例输入)。",
230
+ "",
231
+ "## 6. 检索与阅读计划(Search Plan & Reading Plan)",
232
+ "### 6.1 顺序(建议在 2–4 小时深研里执行)",
233
+ "1) GitHub README + 目录:确定开源范围、模型列表、入口脚本、benchmark 链接。\n"
234
+ "2) Project Page:收集所有外链(HF/ModelScope/Benchmark/Demo/Video)。\n"
235
+ "3) arXiv:抓核心章节:method、experiments、ablation、limitations。\n"
236
+ "4) Model Card:确认权重、许可证、推理限制与样例。\n"
237
+ "5) Press:只作为线索,不作为证据;对 press 中的 claim 做反向核对。",
238
+ "",
239
+ "### 6.2 Query 列表(可直接用于搜索/对照阅读)",
240
+ "\n".join([f"- {q}" for q in queries]),
241
+ "",
242
+ "## 7. 产出设计(Deliverables)",
243
+ "在完成 deep research 后,最终交付物建议包含:",
244
+ "- 长文研究报告(含 Resources、证据矩阵、可复现路径、局限与开放问题)",
245
+ "- 15 分钟演讲稿 + Beamer(信息密度高,但每页只承载一个结论)",
246
+ "- research.json(结构化审计:来源、摘录、结论、证据链接、验证状态)",
247
+ "- resources/(把关键页面快照打包,避免链接失效)",
248
+ "",
249
+ "## 8. 风险与不确定性(Risks & Unknowns)",
250
+ "- Press 可能夸大:需以论文与 benchmark 为准。",
251
+ "- Leaderboard 的口径可能不统一:需逐项核对设置。",
252
+ "- Demo bias:演示视频不等于泛化能力。",
253
+ "- 可复现门槛:依赖、算力、输入管线(视频/多帧)可能较重。",
254
+ "- 许可证与权重条款:代码 Apache-2.0 不等于所有权重都无约束。",
255
+ "",
256
+ "## 9. 资源清单(Prioritized Resources)",
257
+ "\n".join(src_lines),
258
+ "",
259
+ ]
260
+ )
261
+
262
+
263
+ def generate(out: Path, keep_stage: bool, fetch: bool) -> Path:
264
+ stage_dir = stage_dir_for_out(out)
265
+ stage_dir.mkdir(parents=True, exist_ok=True)
266
+ _write_compile_helpers(stage_dir)
267
+
268
+ srcs = sources()
269
+ src_json = [asdict(s) for s in srcs]
270
+
271
+ resources_dir = stage_dir / "resources"
272
+ snapshots_dir = resources_dir / "snapshots"
273
+ snapshots_dir.mkdir(parents=True, exist_ok=True)
274
+ _ = (resources_dir / "sources.json").write_text(
275
+ json.dumps({"sources": src_json}, ensure_ascii=False, indent=2),
276
+ encoding="utf-8",
277
+ )
278
+
279
+ snapshots: list[dict[str, object]] = []
280
+ if fetch:
281
+ for i, s in enumerate(srcs, start=1):
282
+ slug = _slugify(s.title)
283
+ target = snapshots_dir / f"{i:02d}_{slug}.txt"
284
+ entry: dict[str, object] = {"url": s.url, "title": s.title, "path": str(target)}
285
+ try:
286
+ ctype, text = _fetch_snapshot(s.url)
287
+ entry["content_type"] = ctype
288
+ _ = target.write_text(text, encoding="utf-8")
289
+ entry["ok"] = True
290
+ except Exception as e:
291
+ entry["ok"] = False
292
+ entry["error"] = str(e)
293
+ snapshots.append(entry)
294
+
295
+ pre = pre_report_md()
296
+ _ = (stage_dir / "pre_report.md").write_text(pre, encoding="utf-8")
297
+ _ = (stage_dir / "report.md").write_text("# (Not generated in preset mode)\n", encoding="utf-8")
298
+ _ = (stage_dir / "speech.md").write_text("# (Not generated in preset mode)\n", encoding="utf-8")
299
+ _ = (stage_dir / "paper.tex").write_text(
300
+ "\\documentclass[11pt]{article}\n"
301
+ "\\usepackage[UTF8]{ctex}\n"
302
+ "\\usepackage{hyperref}\n"
303
+ "\\title{RynnBrain Pre-Research}\n"
304
+ "\\author{hydradeck preset}\n"
305
+ "\\date{\\today}\n"
306
+ "\\begin{document}\n"
307
+ "\\maketitle\n"
308
+ "\\section*{Pre-Research}\n"
309
+ "This preset package contains a Markdown pre-research report and archived resources.\\\\\n"
310
+ "See pre_report.md and resources/.\n"
311
+ "\\end{document}\n",
312
+ encoding="utf-8",
313
+ )
314
+ _ = (stage_dir / "slides.tex").write_text(
315
+ "\\documentclass{beamer}\n"
316
+ "\\usepackage[UTF8]{ctex}\n"
317
+ "\\usetheme{Madrid}\n"
318
+ "\\title{RynnBrain Pre-Research (15min)}\n"
319
+ "\\author{hydradeck preset}\n"
320
+ "\\date{\\today}\n"
321
+ "\\begin{document}\n"
322
+ "\\frame{\\titlepage}\n"
323
+ "\\begin{frame}{What is inside?}\n"
324
+ "- pre_report.md\\\\\n"
325
+ "- resources/sources.json\\\\\n"
326
+ "- resources/snapshots/*\\\\\n"
327
+ "\\end{frame}\n"
328
+ "\\end{document}\n",
329
+ encoding="utf-8",
330
+ )
331
+ _ = (stage_dir / "refs.bib").write_text("% (Not generated in preset mode)\n", encoding="utf-8")
332
+
333
+ research = {
334
+ "topic": "RynnBrain",
335
+ "mode": "preset-pre",
336
+ "sources": src_json,
337
+ "snapshots": snapshots,
338
+ "meta": {"fetch": fetch},
339
+ }
340
+ _ = (stage_dir / "research.json").write_text(
341
+ json.dumps(research, ensure_ascii=False, indent=2),
342
+ encoding="utf-8",
343
+ )
344
+
345
+ finalize_output(out, stage_dir, keep_stage=keep_stage)
346
+ return out
hydradeck/render.py ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import re
4
+ from dataclasses import dataclass
5
+
6
+ from hydradeck.core.types import ExtractedFact, Source
7
+
8
+ _LATEX_SPECIALS: dict[str, str] = {
9
+ "\\": r"\textbackslash{}",
10
+ "{": r"\{",
11
+ "}": r"\}",
12
+ "#": r"\#",
13
+ "$": r"\$",
14
+ "%": r"\%",
15
+ "&": r"\&",
16
+ "_": r"\_",
17
+ "^": r"\textasciicircum{}",
18
+ "~": r"\textasciitilde{}",
19
+ }
20
+
21
+
22
+ def latex_escape(s: str) -> str:
23
+ return "".join(_LATEX_SPECIALS.get(ch, ch) for ch in s)
24
+
25
+
26
+ def _bib_key(i: int) -> str:
27
+ return f"src{i}"
28
+
29
+
30
+ def _bib_escape(s: str) -> str:
31
+ return s.replace("\\", "\\\\").replace("{", "\\{").replace("}", "\\}")
32
+
33
+
34
+ def render_bibtex(sources: list[Source]) -> str:
35
+ lines: list[str] = []
36
+ for i, s in enumerate(sources, start=1):
37
+ key = _bib_key(i)
38
+ lines.append(f"@misc{{{key},")
39
+ lines.append(f" title = {{{_bib_escape(s.title)}}},")
40
+ lines.append(f" howpublished = {{\\url{{{_bib_escape(s.url)}}}}},")
41
+ lines.append(" note = {Accessed: 2026-03-04},")
42
+ lines.append("}")
43
+ lines.append("")
44
+ return "\n".join(lines).strip() + "\n"
45
+
46
+
47
+ def _replace_numeric_citations(text: str, max_n: int) -> str:
48
+ def repl(m: re.Match[str]) -> str:
49
+ num = int(m.group(1))
50
+ if 1 <= num <= max_n:
51
+ return f"\\cite{{{_bib_key(num)}}}"
52
+ return m.group(0)
53
+
54
+ return re.sub(r"\[(\d{1,3})\]", repl, text)
55
+
56
+
57
+ def _markdown_to_latex_paragraphs(md: str, max_n: int) -> str:
58
+ text = md.strip()
59
+ text = re.sub(r"```[\s\S]*?```", "", text)
60
+ text = re.sub(r"^\s*[-*+]\s+", "", text, flags=re.MULTILINE)
61
+ text = re.sub(r"^\s*#+\s*", "", text, flags=re.MULTILINE)
62
+ text = re.sub(r"`([^`]+)`", r"\1", text)
63
+ text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
64
+ text = re.sub(r"\*(.*?)\*", r"\1", text)
65
+ text = re.sub(r"\[(.*?)\]\((.*?)\)", r"\1", text)
66
+ text = _replace_numeric_citations(text, max_n=max_n)
67
+ text = latex_escape(text)
68
+ text = re.sub(r"\\textbackslash\{\}cite\\\{(src\d+)\\\}", r"\\cite{\1}", text)
69
+ text = text.replace("\n\n", "\n\\par\n")
70
+ return text
71
+
72
+
73
+ def render_paper(
74
+ topic: str,
75
+ outline: list[str],
76
+ body: str,
77
+ facts: list[ExtractedFact],
78
+ sources: list[Source],
79
+ ) -> str:
80
+ topic_e = latex_escape(topic)
81
+ url_to_key = {s.url: _bib_key(i) for i, s in enumerate(sources, start=1)}
82
+
83
+ outline_items = "\n".join([f"\\item {latex_escape(x)}" for x in outline[:10]])
84
+ fact_sentences: list[str] = []
85
+ for f in facts[:18]:
86
+ key = url_to_key.get(f.url)
87
+ cite = f"\\cite{{{key}}}" if key else ""
88
+ sentence = latex_escape(f.claim.strip())
89
+ if sentence and sentence[-1] not in ".!?":
90
+ sentence += "."
91
+ fact_sentences.append(sentence + (cite if cite else ""))
92
+ facts_paragraph = (
93
+ " ".join(fact_sentences)
94
+ if fact_sentences
95
+ else "No extracted facts available."
96
+ )
97
+
98
+ body_latex = _markdown_to_latex_paragraphs(body, max_n=len(sources))
99
+
100
+ return (
101
+ "\\documentclass[11pt]{article}\n"
102
+ "\\usepackage{geometry}\n"
103
+ "\\usepackage{hyperref}\n"
104
+ "\\usepackage{url}\n"
105
+ "\\usepackage{booktabs}\n"
106
+ "\\usepackage{longtable}\n"
107
+ "\\geometry{margin=1in}\n"
108
+ "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
109
+ f"\\title{{{topic_e}}}\n"
110
+ "\\author{hydradeck}\n"
111
+ "\\date{\\today}\n"
112
+ "\\begin{document}\n"
113
+ "\\maketitle\n"
114
+ "\\begin{abstract}\n"
115
+ "This report presents a structured analysis with explicit traceability to sources.\n"
116
+ "\\end{abstract}\n\n"
117
+ "\\section*{1. Introduction and Background}\n"
118
+ + facts_paragraph
119
+ + "\n\n"
120
+ "\\section*{2. Logical Outline}\n"
121
+ "\\begin{itemize}\n"
122
+ + outline_items
123
+ + "\n\\end{itemize}\n\n"
124
+ "\\section*{3. Evidence and Key Findings}\n"
125
+ + body_latex
126
+ + "\n\n"
127
+ "\\section*{4. Limitations and Discussion}\n"
128
+ "The analysis is bounded by available public evidence and may evolve as sources update.\n\n"
129
+ "\\section*{5. Conclusion}\n"
130
+ "Conclusions are presented in a source-traceable form and should be interpreted with the\n"
131
+ "reported assumptions and constraints.\n\n"
132
+ "\\bibliographystyle{plain}\n"
133
+ "\\bibliography{refs}\n"
134
+ "\\end{document}\n"
135
+ )
136
+
137
+
138
+ def render_report_structured(
139
+ topic: str,
140
+ section_blocks: list[dict[str, str]],
141
+ language: str = "en",
142
+ ) -> str:
143
+ lang = language.lower()
144
+ topic_e = latex_escape(topic)
145
+
146
+ if lang == "zh":
147
+ preamble = (
148
+ "\\documentclass[11pt]{ctexart}\n"
149
+ "\\usepackage[a4paper,margin=1in]{geometry}\n"
150
+ "\\usepackage{hyperref}\n"
151
+ "\\usepackage{url}\n"
152
+ "\\usepackage{booktabs}\n"
153
+ "\\usepackage{longtable}\n"
154
+ "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
155
+ f"\\title{{{topic_e}}}\n"
156
+ "\\author{hydradeck}\n"
157
+ "\\date{\\today}\n"
158
+ "\\begin{document}\n"
159
+ "\\maketitle\n"
160
+ )
161
+ else:
162
+ preamble = (
163
+ "\\documentclass[11pt]{article}\n"
164
+ "\\usepackage{geometry}\n"
165
+ "\\usepackage{hyperref}\n"
166
+ "\\usepackage{url}\n"
167
+ "\\usepackage{booktabs}\n"
168
+ "\\usepackage{longtable}\n"
169
+ "\\geometry{margin=1in}\n"
170
+ "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
171
+ f"\\title{{{topic_e}}}\n"
172
+ "\\author{hydradeck}\n"
173
+ "\\date{\\today}\n"
174
+ "\\begin{document}\n"
175
+ "\\maketitle\n"
176
+ )
177
+
178
+ content_parts: list[str] = []
179
+ for block in section_blocks[:10]:
180
+ title = latex_escape(str(block.get("name", "Section")).strip() or "Section")
181
+ latex_body = str(block.get("latex", "")).strip()
182
+ latex_body = re.sub(r"\\section\*?\{[^}]*\}", "", latex_body)
183
+ latex_body = re.sub(r"\\subsection\*?\{[^}]*\}", "", latex_body)
184
+ latex_body = re.sub(r"\\cite\{[^}]*\}", "", latex_body)
185
+ latex_body = re.sub(r"\[(\d{1,3})\]", "", latex_body)
186
+ if not latex_body:
187
+ continue
188
+ content_parts.append(f"\\section*{{{title}}}\n{latex_body}\n")
189
+
190
+ return preamble + "\n".join(content_parts) + "\n\\end{document}\n"
191
+
192
+
193
+ @dataclass
194
+ class SlideFrame:
195
+ title: str
196
+ bullets: list[str]
197
+ note: str = ""
198
+
199
+
200
+ def render_beamer(topic: str, outline: list[str], bullets: list[str]) -> str:
201
+ section_blocks = [{"name": t, "latex": b} for t, b in zip(outline, bullets)]
202
+ if not section_blocks:
203
+ section_blocks = [{"name": "Summary", "latex": "Key findings and implications."}]
204
+ frames = build_slide_frames_from_sections(section_blocks, language="en")
205
+ frames = enforce_slide_density(frames, language="en")
206
+ return render_beamer_frames(topic, frames, language="en")
207
+
208
+
209
+ def render_beamer_from_report(topic: str, report_tex: str) -> str:
210
+ frames = build_slide_frames_from_report(report_tex, language="en")
211
+ frames = enforce_slide_density(frames, language="en")
212
+ return render_beamer_frames(topic, frames, language="en")
213
+
214
+
215
+ def _split_paragraph_to_bullets(text: str, language: str) -> list[str]:
216
+ lang = language.lower()
217
+ if lang == "zh":
218
+ parts = [x.strip() for x in re.split(r"[。!?]\s*", text) if x.strip()]
219
+ out: list[str] = []
220
+ for p in parts:
221
+ if len(p) < 6:
222
+ continue
223
+ out.append(_trim_chars(_clean_text_for_slide(p), 28))
224
+ return out
225
+
226
+ parts = [x.strip() for x in re.split(r"[.!?]\s+", text) if x.strip()]
227
+ out2: list[str] = []
228
+ for p in parts:
229
+ clean = _clean_text_for_slide(p)
230
+ if len(clean) < 14:
231
+ continue
232
+ out2.append(_trim_words(clean, 14))
233
+ return out2
234
+
235
+
236
+ def build_slide_frames_from_sections(
237
+ section_blocks: list[dict[str, str]],
238
+ language: str = "en",
239
+ ) -> list[SlideFrame]:
240
+ lang = language.lower()
241
+ frames: list[SlideFrame] = []
242
+ for block in section_blocks[:8]:
243
+ title = str(block.get("name", "Section")).strip() or ("章节" if lang == "zh" else "Section")
244
+ body = str(block.get("latex", ""))
245
+ body = re.sub(r"\\section\*?\{[^}]*\}", "", body)
246
+ body = re.sub(r"\\subsection\*?\{[^}]*\}", "", body)
247
+ body = re.sub(r"\\cite\{[^}]*\}", "", body)
248
+ body = re.sub(r"\[(\d{1,3})\]", "", body)
249
+ bullets = _split_paragraph_to_bullets(body, lang)
250
+ if not bullets:
251
+ continue
252
+
253
+ chunk = 4 if lang == "zh" else 4
254
+ for i in range(0, len(bullets), chunk):
255
+ part = bullets[i : i + chunk]
256
+ if not part:
257
+ continue
258
+ if i == 0:
259
+ frame_title = title
260
+ else:
261
+ frame_title = f"{title}(续)" if lang == "zh" else f"{title} (cont.)"
262
+ frames.append(SlideFrame(title=frame_title, bullets=part))
263
+
264
+ if not frames:
265
+ raise RuntimeError("insufficient readable section content for slides")
266
+ return frames
267
+
268
+
269
+ def enforce_slide_density(
270
+ frames: list[SlideFrame],
271
+ language: str = "en",
272
+ max_bullets_per_frame: int = 4,
273
+ max_chars_per_bullet_zh: int = 28,
274
+ max_words_per_bullet_en: int = 14,
275
+ ) -> list[SlideFrame]:
276
+ lang = language.lower()
277
+ out: list[SlideFrame] = []
278
+
279
+ for fr in frames:
280
+ normalized: list[str] = []
281
+ for b in fr.bullets:
282
+ clean = _clean_text_for_slide(b)
283
+ if not clean:
284
+ continue
285
+ if lang == "zh":
286
+ clean = _trim_chars(clean, max_chars_per_bullet_zh)
287
+ else:
288
+ clean = _trim_words(clean, max_words_per_bullet_en)
289
+ if clean:
290
+ normalized.append(clean)
291
+
292
+ if not normalized:
293
+ continue
294
+
295
+ for i in range(0, len(normalized), max_bullets_per_frame):
296
+ chunk = normalized[i : i + max_bullets_per_frame]
297
+ if not chunk:
298
+ continue
299
+ if i == 0:
300
+ title = fr.title
301
+ else:
302
+ title = f"{fr.title}(续)" if lang == "zh" else f"{fr.title} (cont.)"
303
+ out.append(SlideFrame(title=title, bullets=chunk, note=fr.note))
304
+
305
+ if not out:
306
+ raise RuntimeError("slide density guard removed all frames")
307
+ return out
308
+
309
+
310
+ def _trim_words(text: str, max_words: int) -> str:
311
+ words = text.split()
312
+ if len(words) <= max_words:
313
+ return text
314
+ return " ".join(words[:max_words]).rstrip(" ,.;") + "..."
315
+
316
+
317
+ def _trim_chars(text: str, max_chars: int) -> str:
318
+ t = text.strip()
319
+ if len(t) <= max_chars:
320
+ return t
321
+ return t[: max_chars - 1].rstrip(",。,. ") + "…"
322
+
323
+
324
+ def _clean_text_for_slide(text: str) -> str:
325
+ t = text.strip()
326
+ t = re.sub(r"\s+", " ", t)
327
+ t = re.sub(r"`([^`]+)`", r"\1", t)
328
+ t = re.sub(r"\*\*(.*?)\*\*", r"\1", t)
329
+ t = re.sub(r"\*(.*?)\*", r"\1", t)
330
+ return t
331
+
332
+
333
+ def build_slide_frames_from_report(report_tex: str, language: str = "en") -> list[SlideFrame]:
334
+ lang = language.lower()
335
+ sections = re.split(r"\\section\*\{([^}]+)\}", report_tex)
336
+ parsed: list[tuple[str, str]] = []
337
+ if len(sections) >= 3:
338
+ for i in range(1, len(sections), 2):
339
+ title = sections[i].strip()
340
+ body = sections[i + 1] if i + 1 < len(sections) else ""
341
+ parsed.append((title, body))
342
+
343
+ if not parsed:
344
+ raise RuntimeError("cannot derive slide frames from report structure")
345
+
346
+ frames: list[SlideFrame] = []
347
+ for title, body in parsed[:8]:
348
+ plain = re.sub(r"\\[a-zA-Z]+\*?(\[[^\]]*\])?(\{[^}]*\})?", " ", body)
349
+ chunks = [x.strip() for x in re.split(r"[。.!?]\s+", plain) if x.strip()]
350
+ bullets: list[str] = []
351
+ for c in chunks:
352
+ clean = _clean_text_for_slide(c)
353
+ if not clean:
354
+ continue
355
+ if lang == "zh":
356
+ if len(clean) < 8:
357
+ continue
358
+ bullets.append(_trim_chars(clean, 30))
359
+ else:
360
+ if len(clean) < 12:
361
+ continue
362
+ bullets.append(_trim_words(clean, 16))
363
+ if len(bullets) >= 5:
364
+ break
365
+ if not bullets:
366
+ raise RuntimeError(f"insufficient bullet content for slide '{title}'")
367
+ frames.append(SlideFrame(title=title, bullets=bullets))
368
+
369
+ return frames
370
+
371
+
372
+ def render_beamer_frames(topic: str, frames: list[SlideFrame], language: str = "en") -> str:
373
+ lang = language.lower()
374
+ topic_e = latex_escape(topic)
375
+ agenda_label = "目录" if lang == "zh" else "Agenda"
376
+ summary_title = "总结" if lang == "zh" else "Summary"
377
+
378
+ agenda_items = "\n".join([f"\\item {latex_escape(f.title)}" for f in frames[:8]])
379
+
380
+ frame_blocks: list[str] = []
381
+ for fr in frames[:10]:
382
+ b = "\n".join([f"\\item {latex_escape(x)}" for x in fr.bullets[:5]])
383
+ frame_blocks.append(
384
+ "\\begin{frame}[t]{"
385
+ + latex_escape(fr.title)
386
+ + "}\n"
387
+ + "\\begin{itemize}\n"
388
+ + b
389
+ + "\n\\end{itemize}\n"
390
+ + (f"\\vspace{{0.6em}}\\footnotesize {latex_escape(fr.note)}\n" if fr.note else "")
391
+ + "\\end{frame}\n"
392
+ )
393
+
394
+ summary_bullets: list[str] = []
395
+ for fr in frames[:5]:
396
+ if fr.bullets:
397
+ summary_bullets.append(fr.bullets[0])
398
+ if not summary_bullets:
399
+ summary_bullets = ["关键要点见前页。" if lang == "zh" else "Key points are summarized in previous slides."]
400
+ summary_items = "\n".join([f"\\item {latex_escape(x)}" for x in summary_bullets])
401
+
402
+ if lang == "zh":
403
+ return (
404
+ "\\documentclass[aspectratio=169]{ctexbeamer}\n"
405
+ "\\usetheme{Madrid}\n"
406
+ "\\usefonttheme{professionalfonts}\n"
407
+ "\\setbeamertemplate{navigation symbols}{}\n"
408
+ "\\usepackage{hyperref}\n"
409
+ "\\usepackage{booktabs}\n"
410
+ "\\definecolor{AccentBlue}{HTML}{1F4E79}\n"
411
+ "\\setbeamercolor{title}{fg=AccentBlue}\n"
412
+ "\\setbeamercolor{frametitle}{fg=AccentBlue}\n"
413
+ "\\setbeamerfont{title}{series=\\bfseries,size=\\Large}\n"
414
+ "\\setbeamerfont{frametitle}{series=\\bfseries,size=\\large}\n"
415
+ f"\\title{{{topic_e}}}\n"
416
+ "\\author{hydradeck}\n"
417
+ "\\date{\\today}\n"
418
+ "\\begin{document}\n"
419
+ "\\frame{\\titlepage}\n"
420
+ "\\begin{frame}{"
421
+ + latex_escape(agenda_label)
422
+ + "}\n"
423
+ "\\begin{itemize}\n"
424
+ + agenda_items
425
+ + "\n\\end{itemize}\n"
426
+ "\\end{frame}\n"
427
+ + "".join(frame_blocks)
428
+ + "\\begin{frame}{"
429
+ + latex_escape(summary_title)
430
+ + "}\n"
431
+ + "\\begin{itemize}\n"
432
+ + summary_items
433
+ + "\n\\end{itemize}\n"
434
+ + "\\end{frame}\n"
435
+ + "\\end{document}\n"
436
+ )
437
+
438
+ return (
439
+ "\\documentclass[aspectratio=169]{beamer}\n"
440
+ "\\usetheme{metropolis}\n"
441
+ "\\usefonttheme{professionalfonts}\n"
442
+ "\\setbeamertemplate{navigation symbols}{}\n"
443
+ "\\usepackage{hyperref}\n"
444
+ "\\usepackage{booktabs}\n"
445
+ "\\definecolor{AccentBlue}{HTML}{1F4E79}\n"
446
+ "\\setbeamercolor{title}{fg=AccentBlue}\n"
447
+ "\\setbeamercolor{frametitle}{fg=AccentBlue}\n"
448
+ "\\setbeamerfont{title}{series=\\bfseries,size=\\Large}\n"
449
+ "\\setbeamerfont{frametitle}{series=\\bfseries,size=\\large}\n"
450
+ f"\\title{{{topic_e}}}\n"
451
+ "\\author{hydradeck}\n"
452
+ "\\date{\\today}\n"
453
+ "\\begin{document}\n"
454
+ "\\frame{\\titlepage}\n"
455
+ "\\begin{frame}{"
456
+ + latex_escape(agenda_label)
457
+ + "}\n"
458
+ "\\begin{itemize}\n"
459
+ + agenda_items
460
+ + "\n\\end{itemize}\n"
461
+ "\\end{frame}\n"
462
+ + "".join(frame_blocks)
463
+ + "\\begin{frame}{"
464
+ + latex_escape(summary_title)
465
+ + "}\n"
466
+ + "\\begin{itemize}\n"
467
+ + summary_items
468
+ + "\n\\end{itemize}\n"
469
+ + "\\end{frame}\n"
470
+ + "\\end{document}\n"
471
+ )
hydradeck/resources_pack.py ADDED
@@ -0,0 +1,706 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import re
5
+ import time
6
+ import urllib.parse
7
+ from dataclasses import asdict
8
+ from pathlib import Path
9
+
10
+ import requests
11
+
12
+ from hydradeck.agents.personas import PERSONAS
13
+ from hydradeck.clients import ChatMessage, GrokClient, GrokClientError
14
+ from hydradeck.core.types import RunConfig, Source
15
+ from hydradeck.packaging import finalize_output, stage_dir_for_out
16
+ from hydradeck.utils import Heartbeat, Progress
17
+
18
+
19
+ def _slugify(s: str) -> str:
20
+ t = s.strip().lower()
21
+ t = re.sub(r"[^a-z0-9]+", "-", t)
22
+ t = re.sub(r"-+", "-", t).strip("-")
23
+ return t or "source"
24
+
25
+
26
+ def _extract_sources(obj: dict[str, object], max_sources: int) -> list[Source]:
27
+ raw = obj.get("sources")
28
+ out: list[Source] = []
29
+ if isinstance(raw, list):
30
+ for item in raw[:max_sources]:
31
+ if not isinstance(item, dict):
32
+ continue
33
+ url_v = item.get("url")
34
+ title_v = item.get("title")
35
+ snippet_v = item.get("snippet")
36
+ if isinstance(url_v, str) and isinstance(title_v, str) and isinstance(snippet_v, str):
37
+ out.append(Source(url=url_v, title=title_v, snippet=snippet_v))
38
+ return out
39
+
40
+
41
+ def build_resources_pack(cfg: RunConfig) -> Path:
42
+ stage_dir = stage_dir_for_out(cfg.out)
43
+ stage_dir.mkdir(parents=True, exist_ok=True)
44
+
45
+ t0 = time.time()
46
+
47
+ def remaining_s() -> float:
48
+ return max(0.0, cfg.max_total_runtime_s - (time.time() - t0))
49
+
50
+ def budget_timeout() -> float:
51
+ return max(1.0, min(cfg.request_budget_s, remaining_s()))
52
+
53
+ def llm_timeout() -> float:
54
+ return max(1.0, min(cfg.llm_timeout_s, budget_timeout()))
55
+
56
+ progress = Progress(enabled=cfg.progress, total=6, label="resources")
57
+ progress.update("start", inc=0)
58
+
59
+ if cfg.use_mock:
60
+ from hydradeck.clients.grok_client import MockClient
61
+
62
+ client = MockClient()
63
+ else:
64
+ client = GrokClient(
65
+ base_url=cfg.base_url,
66
+ api_key=cfg.api_key,
67
+ model=cfg.model,
68
+ timeout_s=llm_timeout(),
69
+ heartbeat=cfg.verbose,
70
+ )
71
+
72
+ query_planner = next(p for p in PERSONAS if p.name == "QueryPlanner")
73
+ librarian = next(p for p in PERSONAS if p.name == "Librarian")
74
+
75
+ qp_obj = client.chat_json(
76
+ [
77
+ ChatMessage(role="system", content=query_planner.system_prompt),
78
+ ChatMessage(
79
+ role="user",
80
+ content=(
81
+ "Return JSON: {queries:[...]} with 6 high-recall queries for primary sources. "
82
+ "Topic: "
83
+ + cfg.topic
84
+ ),
85
+ ),
86
+ ],
87
+ schema_hint='{ "queries": ["..."] }',
88
+ temperature=0.2,
89
+ timeout_s=llm_timeout() if not cfg.use_mock else None,
90
+ )
91
+ progress.update("queries")
92
+ raw_q = qp_obj.get("queries")
93
+ if isinstance(raw_q, list):
94
+ queries = [q for q in raw_q if isinstance(q, str) and q.strip()]
95
+ else:
96
+ queries = []
97
+ if not queries:
98
+ queries = [cfg.topic]
99
+
100
+ seen: set[str] = set()
101
+ sources: list[Source] = []
102
+ for q in queries[: min(3, len(queries))]:
103
+ req = (
104
+ "Return JSON with key sources: list of {url,title,snippet}. "
105
+ "Give authoritative sources (prefer official docs, papers, repos). "
106
+ "Query: "
107
+ + q
108
+ )
109
+ try:
110
+ src_obj = client.chat_json(
111
+ [
112
+ ChatMessage(role="system", content=librarian.system_prompt),
113
+ ChatMessage(role="user", content=req),
114
+ ],
115
+ schema_hint='{ "sources": [ {"url":"...","title":"...","snippet":"..."} ] }',
116
+ temperature=0.2,
117
+ timeout_s=llm_timeout() if not cfg.use_mock else None,
118
+ )
119
+ except GrokClientError:
120
+ continue
121
+ for s in _extract_sources(src_obj, cfg.module_sources):
122
+ if s.url in seen:
123
+ continue
124
+ seen.add(s.url)
125
+ sources.append(s)
126
+ if len(sources) >= cfg.max_sources:
127
+ break
128
+ if len(sources) >= cfg.max_sources:
129
+ break
130
+ progress.update("sources")
131
+
132
+ if not sources:
133
+ sources = [
134
+ Source(
135
+ url="https://github.com/alibaba-damo-academy/RynnBrain",
136
+ title="RynnBrain",
137
+ snippet="",
138
+ )
139
+ ]
140
+ progress.update("sources")
141
+
142
+ resources_dir = stage_dir / "resources"
143
+ snaps_dir = resources_dir / "snapshots"
144
+ snaps_dir.mkdir(parents=True, exist_ok=True)
145
+
146
+ snap_meta: list[dict[str, object]] = []
147
+ snap_start = time.time()
148
+ for i, s in enumerate(sources, start=1):
149
+ if (time.time() - snap_start) > cfg.snapshot_total_timeout_s:
150
+ break
151
+ target_base = snaps_dir / f"{i:02d}_{_slugify(s.title)}"
152
+ entry: dict[str, object] = {"url": s.url, "title": s.title}
153
+ if cfg.use_mock:
154
+ entry["ok"] = True
155
+ target = target_base.with_suffix(".txt")
156
+ entry["path"] = str(target)
157
+ target.write_text("mock snapshot", encoding="utf-8")
158
+ snap_meta.append(entry)
159
+ continue
160
+ try:
161
+ with Heartbeat(enabled=cfg.verbose, label=f"fetch {s.url}", interval_s=5.0):
162
+ r = requests.get(
163
+ s.url,
164
+ timeout=min(cfg.snapshot_timeout_s, budget_timeout()),
165
+ headers={"User-Agent": "hydradeck/0.1"},
166
+ )
167
+ r.raise_for_status()
168
+ ctype = r.headers.get("content-type", "")
169
+ entry["content_type"] = ctype
170
+
171
+ is_pdf = "application/pdf" in ctype.lower() or s.url.lower().endswith(".pdf")
172
+ if is_pdf:
173
+ data = r.content
174
+ if len(data) > 5_000_000:
175
+ data = data[:5_000_000]
176
+ target = target_base.with_suffix(".pdf")
177
+ entry["path"] = str(target)
178
+ target.write_bytes(data)
179
+ entry["binary"] = True
180
+ else:
181
+ txt = r.text
182
+ if len(txt) > 200_000:
183
+ txt = txt[:200_000]
184
+ target = target_base.with_suffix(".txt")
185
+ entry["path"] = str(target)
186
+ target.write_text(txt, encoding="utf-8")
187
+ entry["ok"] = True
188
+ except Exception as e:
189
+ entry["ok"] = False
190
+ entry["error"] = str(e)
191
+ snap_meta.append(entry)
192
+ progress.update("snapshots")
193
+
194
+ (resources_dir / "sources.json").write_text(
195
+ json.dumps({"sources": [asdict(s) for s in sources]}, ensure_ascii=False, indent=2),
196
+ encoding="utf-8",
197
+ )
198
+ (resources_dir / "snapshots.json").write_text(
199
+ json.dumps({"snapshots": snap_meta}, ensure_ascii=False, indent=2),
200
+ encoding="utf-8",
201
+ )
202
+ (stage_dir / "research.json").write_text(
203
+ json.dumps(
204
+ {
205
+ "topic": cfg.topic,
206
+ "mode": "resources",
207
+ "sources": [asdict(s) for s in sources],
208
+ "snapshots": snap_meta,
209
+ },
210
+ ensure_ascii=False,
211
+ indent=2,
212
+ ),
213
+ encoding="utf-8",
214
+ )
215
+
216
+ progress.update("package")
217
+
218
+ try:
219
+ paper_tex, slides_tex = _generate_pre_tex(cfg, client, sources)
220
+ except Exception as e:
221
+ (stage_dir / "pre_tex_error.txt").write_text(str(e) + "\n", encoding="utf-8")
222
+ paper_tex = _render_paper_tex(cfg.topic, sources)
223
+ slides_tex = _render_slides_tex(cfg.topic, sources)
224
+
225
+ (stage_dir / "pre_paper.tex").write_text(paper_tex, encoding="utf-8")
226
+ (stage_dir / "pre_slides.tex").write_text(slides_tex, encoding="utf-8")
227
+
228
+ pdf_dir = stage_dir / "pdf"
229
+ pdf_dir.mkdir(parents=True, exist_ok=True)
230
+ urls: list[str] = []
231
+ errors: list[str] = []
232
+
233
+ if cfg.use_mock:
234
+ (pdf_dir / "pre_paper.pdf").write_bytes(_dummy_pdf_bytes("paper"))
235
+ (pdf_dir / "pre_slides.pdf").write_bytes(_dummy_pdf_bytes("slides"))
236
+ else:
237
+ try:
238
+ paper_pdf, paper_meta = _compile_pdf(
239
+ paper_tex,
240
+ engine="xelatex",
241
+ backend=cfg.pdf_compiler,
242
+ )
243
+ (pdf_dir / "pre_paper.pdf").write_bytes(paper_pdf)
244
+ urls.extend(paper_meta.get("urls", []))
245
+ errors.extend(paper_meta.get("errors", []))
246
+ except Exception as e:
247
+ errors.append("paper: " + str(e))
248
+
249
+ try:
250
+ slides_pdf, slides_meta = _compile_pdf(
251
+ slides_tex,
252
+ engine="xelatex",
253
+ backend=cfg.pdf_compiler,
254
+ )
255
+ (pdf_dir / "pre_slides.pdf").write_bytes(slides_pdf)
256
+ urls.extend(slides_meta.get("urls", []))
257
+ errors.extend(slides_meta.get("errors", []))
258
+ except Exception as e:
259
+ errors.append("slides: " + str(e))
260
+
261
+ if not (pdf_dir / "pre_paper.pdf").exists():
262
+ errors.append("paper pdf missing")
263
+ if not (pdf_dir / "pre_slides.pdf").exists():
264
+ errors.append("slides pdf missing")
265
+
266
+ if urls:
267
+ (stage_dir / "latexonline_url.txt").write_text("\n".join(urls) + "\n", encoding="utf-8")
268
+ if errors:
269
+ (stage_dir / "latexonline_error.txt").write_text("\n".join(errors) + "\n", encoding="utf-8")
270
+
271
+ finalize_output(cfg.out, stage_dir, keep_stage=cfg.keep_stage)
272
+ progress.done("packaged")
273
+ return cfg.out
274
+
275
+
276
+ def _render_paper_tex(topic: str, sources: list[Source]) -> str:
277
+ def esc(s: str) -> str:
278
+ return (
279
+ s.replace("\\", r"\textbackslash{}")
280
+ .replace("{", r"\{")
281
+ .replace("}", r"\}")
282
+ .replace("%", r"\%")
283
+ .replace("_", r"\_")
284
+ .replace("&", r"\&")
285
+ .replace("#", r"\#")
286
+ .replace("$", r"\$")
287
+ )
288
+
289
+ items = []
290
+ for _i, s in enumerate(sources, start=1):
291
+ items.append(
292
+ "\\item "
293
+ + esc(s.title)
294
+ + "\\\\\n"
295
+ + "\\small\\url{" + esc(s.url) + "}\\normalsize\\\\\n"
296
+ + "\\textit{" + esc(s.snippet[:240]) + "}"
297
+ )
298
+ body = "\n".join(items) if items else "\\item (暂无来源)"
299
+ return (
300
+ "\\documentclass[11pt]{article}\n"
301
+ "\\usepackage[UTF8]{ctex}\n"
302
+ "\\usepackage{hyperref}\n"
303
+ "\\usepackage{url}\n"
304
+ "\\usepackage{booktabs}\n"
305
+ "\\title{" + esc(topic) + "——资源预研报告(论文版)}\n"
306
+ "\\author{hydradeck}\n"
307
+ "\\date{\\today}\n"
308
+ "\\begin{document}\n"
309
+ "\\maketitle\n"
310
+ "\\section*{来源清单}\n"
311
+ "\\begin{enumerate}\n"
312
+ + body
313
+ + "\n\\end{enumerate}\n"
314
+ "\\end{document}\n"
315
+ )
316
+
317
+
318
+ def _render_slides_tex(topic: str, sources: list[Source]) -> str:
319
+ def esc(s: str) -> str:
320
+ return (
321
+ s.replace("\\", r"\textbackslash{}")
322
+ .replace("{", r"\{")
323
+ .replace("}", r"\}")
324
+ .replace("%", r"\%")
325
+ .replace("_", r"\_")
326
+ .replace("&", r"\&")
327
+ .replace("#", r"\#")
328
+ .replace("$", r"\$")
329
+ )
330
+
331
+ bullets: list[str] = []
332
+ for s in sources[:8]:
333
+ bullets.append(esc(s.title))
334
+
335
+ items = "\n".join(["\\item " + b for b in bullets]) or "\\item (暂无来源)"
336
+ return (
337
+ "\\documentclass{beamer}\n"
338
+ "\\usepackage[UTF8]{ctex}\n"
339
+ "\\usetheme{Madrid}\n"
340
+ "\\title{" + esc(topic) + "——资源预研简报(幻灯片)}\n"
341
+ "\\author{hydradeck}\n"
342
+ "\\date{\\today}\n"
343
+ "\\begin{document}\n"
344
+ "\\frame{\\titlepage}\n"
345
+ "\\begin{frame}{关键来源}\n"
346
+ "\\begin{itemize}\n"
347
+ + items
348
+ + "\n\\end{itemize}\n"
349
+ "\\end{frame}\n"
350
+ "\\end{document}\n"
351
+ )
352
+
353
+
354
+ def _latexonline_compile_url(tex: str, command: str) -> str:
355
+ q = urllib.parse.quote(tex, safe="")
356
+ return "https://latexonline.cc/compile?text=" + q + "&command=" + command + "&force=true"
357
+
358
+
359
+ def _compile_pdf(tex: str, engine: str, backend: str) -> tuple[bytes, dict[str, list[str]]]:
360
+ meta: dict[str, list[str]] = {"urls": [], "errors": []}
361
+ b = backend.strip().lower()
362
+ if b not in {"auto", "latexonline", "texlive"}:
363
+ b = "auto"
364
+
365
+ if b in {"auto", "latexonline"}:
366
+ try:
367
+ meta["urls"].append(_latexonline_compile_url(tex, command=engine))
368
+ data = _compile_latexonline(tex, command=engine)
369
+ _ensure_pdf_bytes(data, where="latexonline")
370
+ return data, meta
371
+ except Exception as e:
372
+ meta["errors"].append("latexonline: " + str(e))
373
+ if b == "latexonline":
374
+ raise
375
+
376
+ try:
377
+ data = _compile_texlive_latexcgi(tex, engine=engine)
378
+ _ensure_pdf_bytes(data, where="texlive")
379
+ return data, meta
380
+ except Exception as e:
381
+ meta["errors"].append("texlive latexcgi: " + str(e))
382
+ raise
383
+
384
+
385
+ def _ensure_pdf_bytes(data: bytes, where: str) -> None:
386
+ if data.startswith(b"%PDF"):
387
+ return
388
+ head = data[:200].decode("utf-8", errors="replace")
389
+ raise RuntimeError(f"{where} did not return PDF. Head: {head}")
390
+
391
+
392
+ def _compile_latexonline(tex: str, command: str) -> bytes:
393
+ url = _latexonline_compile_url(tex, command=command)
394
+ r = requests.get(url, timeout=120.0)
395
+ if r.status_code >= 400:
396
+ raise RuntimeError(f"latexonline HTTP {r.status_code}: {r.text[:2000]}")
397
+ return r.content
398
+
399
+
400
+ def _compile_texlive_latexcgi(tex: str, engine: str) -> bytes:
401
+ url = "https://texlive.net/cgi-bin/latexcgi"
402
+ files = {
403
+ "filename[]": (None, "document.tex"),
404
+ "filecontents[]": (None, tex),
405
+ "engine": (None, engine),
406
+ "return": (None, "pdf"),
407
+ }
408
+ r = requests.post(url, files=files, timeout=120.0)
409
+ if r.status_code >= 400:
410
+ raise RuntimeError(f"texlive latexcgi HTTP {r.status_code}: {r.text[:2000]}")
411
+ return r.content
412
+
413
+
414
+ def _generate_pre_tex(cfg: RunConfig, client, sources: list[Source]) -> tuple[str, str]:
415
+ if cfg.use_mock:
416
+ return _render_paper_tex(cfg.topic, sources), _render_slides_tex(cfg.topic, sources)
417
+
418
+ if cfg.template.strip().lower() in {"pretty", "iclr2026"}:
419
+ return _generate_pre_tex_pretty(cfg, client, sources)
420
+
421
+ outline = _pre_outline(cfg.topic)
422
+ src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
423
+ feedback = ""
424
+ last_paper = _render_paper_tex(cfg.topic, sources)
425
+ last_slides = _render_slides_tex(cfg.topic, sources)
426
+ for _attempt in range(max(1, cfg.pre_tex_attempts)):
427
+ msgs = [
428
+ ChatMessage(
429
+ role="system",
430
+ content=(
431
+ "你是严谨的 LaTeX 作者。"
432
+ "必须输出可用 XeLaTeX 编译的高信息密度中文内容。"
433
+ "不要输出 JSON。"
434
+ ),
435
+ ),
436
+ ChatMessage(
437
+ role="user",
438
+ content=(
439
+ "生成两个 LaTeX 文档(全部使用简体中文):\n"
440
+ "(1) paper_tex:article 论文版预研报告,结构严格,信息密度高。\n"
441
+ "(2) slides_tex:beamer 16:9,15 分钟汇报(8-10 页)。\n\n"
442
+ "共同硬约束:\n"
443
+ "- 使用 ctex + xelatex\n"
444
+ "- 禁止空话;每节必须有可执行要点/表格\n"
445
+ "- 必须包含“参考资源”并列出全部来源 URL\n\n"
446
+ "paper 结构(标题可扩展但需覆盖以下要点):\n"
447
+ + "\n".join(["- " + x for x in outline["paper"]])
448
+ + "\n\nslides 结构(每项至少一页):\n"
449
+ + "\n".join(["- " + x for x in outline["slides"]])
450
+ + "\n\n来源 JSON:\n"
451
+ + src_json
452
+ + ("\n\n评审反馈:\n" + feedback if feedback else "")
453
+ + "\n\n输出格式(必须严格使用):\n"
454
+ + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
455
+ + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
456
+ ),
457
+ ),
458
+ ]
459
+ text = client.chat_text(msgs, temperature=0.2)
460
+ parsed = _parse_marked_tex(text)
461
+ paper = parsed.get("paper")
462
+ slides = parsed.get("slides")
463
+ if not isinstance(paper, str) or not isinstance(slides, str):
464
+ feedback = "Output must contain both <<<paper.tex>>> and <<<slides.tex>>> blocks."
465
+ continue
466
+
467
+ last_paper, last_slides = paper, slides
468
+ score, fb = _score_pre_tex(paper, slides, sources)
469
+ if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
470
+ return paper, slides
471
+ feedback = fb
472
+
473
+ return last_paper, last_slides
474
+
475
+
476
+ def _generate_pre_tex_iclr2026(
477
+ cfg: RunConfig,
478
+ client,
479
+ sources: list[Source],
480
+ ) -> tuple[str, str]:
481
+ src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
482
+ feedback = ""
483
+ last_paper = ""
484
+ last_slides = ""
485
+ for _attempt in range(max(1, cfg.pre_tex_attempts)):
486
+ msgs = [
487
+ ChatMessage(
488
+ role="system",
489
+ content=(
490
+ "你撰写严谨的 ICLR 风格预研文稿。"
491
+ "paper 必须使用 \\usepackage{iclr2026_conference,times}。"
492
+ "输出必须为简体中文,不要输出 JSON。"
493
+ ),
494
+ ),
495
+ ChatMessage(
496
+ role="user",
497
+ content=(
498
+ "任务:撰写 (1) paper.tex(ICLR 论文风格)和 (2) slides.tex(beamer)。\n"
499
+ "场景:15 分钟预研汇报。\n"
500
+ "要求高信息密度:至少 2 张表(证据计划、风险登记)。\n"
501
+ "必须包含“参考资源”并列出所有来源 URL。\n\n"
502
+ "paper.tex 要求:\n"
503
+ "- Use: \\documentclass{article} and \\usepackage{iclr2026_conference,times}\n"
504
+ "- 包含:标题、摘要(<=150 词)\n"
505
+ "- 章节:目标、待验证主张、研究问题、范围/非范围\n"
506
+ " 证据计划(表)、来源映射、风险(表)、时间线(表)\n"
507
+ " 交付物、参考资源\n"
508
+ "- 禁止空话,每个要点必须可执行。\n\n"
509
+ "slides.tex 要求:\n"
510
+ "- 16:9 beamer, 8-10 frames, 1 idea per slide\n"
511
+ "- 至少 1 页证据矩阵,至少 1 页风险页\n\n"
512
+ "来源 JSON:\n"
513
+ + src_json
514
+ + ("\n\n反馈:\n" + feedback if feedback else "")
515
+ + "\n\n输出格式(必须严格):\n"
516
+ + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
517
+ + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
518
+ ),
519
+ ),
520
+ ]
521
+ text = client.chat_text(msgs, temperature=0.2)
522
+ parsed = _parse_marked_tex(text)
523
+ paper = parsed.get("paper")
524
+ slides = parsed.get("slides")
525
+ if not isinstance(paper, str) or not isinstance(slides, str):
526
+ feedback = "Missing marked blocks."
527
+ continue
528
+ last_paper, last_slides = paper, slides
529
+ score, fb = _score_pre_tex(paper, slides, sources)
530
+ if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
531
+ return paper, slides
532
+ feedback = fb
533
+
534
+ if last_paper and last_slides:
535
+ return last_paper, last_slides
536
+ return _render_paper_tex(cfg.topic, sources), _render_slides_tex(cfg.topic, sources)
537
+
538
+
539
+ def _generate_pre_tex_pretty(
540
+ cfg: RunConfig,
541
+ client,
542
+ sources: list[Source],
543
+ ) -> tuple[str, str]:
544
+ src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
545
+ feedback = ""
546
+ last_paper = _render_paper_tex(cfg.topic, sources)
547
+ last_slides = _render_slides_tex(cfg.topic, sources)
548
+
549
+ for _attempt in range(max(1, cfg.pre_tex_attempts)):
550
+ msgs = [
551
+ ChatMessage(
552
+ role="system",
553
+ content=(
554
+ "你是严谨的 LaTeX 作者。"
555
+ "请输出可直接编译、结构完整、信息密度高的中文 .tex 文件。"
556
+ "不要输出 JSON。"
557
+ ),
558
+ ),
559
+ ChatMessage(
560
+ role="user",
561
+ content=(
562
+ "生成两个自包含 LaTeX 文件(简体中文):\n"
563
+ "A) pre_paper.tex:article。\n"
564
+ "B) pre_slides.tex:beamer 16:9。\n\n"
565
+ "paper 要求:\n"
566
+ "- 使用 xelatex + ctex\n"
567
+ "- 版式整洁,信息密度高,无空话\n"
568
+ "- 章节至少覆盖:背景、创新、架构、能力、应用、局限、结论、参考资源\n"
569
+ "- 每条来源至少引用一次(\\cite{})\n\n"
570
+ "slides 要求:\n"
571
+ "- 8-10 页,一页一核心观点\n"
572
+ "- 至少 1 页证据矩阵,至少 1 页风险页\n\n"
573
+ "来源 JSON(以此为准):\n"
574
+ + src_json
575
+ + ("\n\n反馈:\n" + feedback if feedback else "")
576
+ + "\n\n输出格式(必须严格):\n"
577
+ + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
578
+ + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
579
+ ),
580
+ ),
581
+ ]
582
+ text = client.chat_text(msgs, temperature=0.2)
583
+ parsed = _parse_marked_tex(text)
584
+ paper = parsed.get("paper")
585
+ slides = parsed.get("slides")
586
+ if not isinstance(paper, str) or not isinstance(slides, str):
587
+ feedback = "Missing marked blocks."
588
+ continue
589
+ last_paper, last_slides = paper, slides
590
+
591
+ score, fb = _score_pre_tex(paper, slides, sources)
592
+ if "thebibliography" not in paper:
593
+ score *= 0.75
594
+ if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
595
+ return paper, slides
596
+ feedback = fb
597
+
598
+ return last_paper, last_slides
599
+
600
+
601
+ def _pre_outline(topic: str) -> dict[str, list[str]]:
602
+ _ = topic
603
+ return {
604
+ "paper": [
605
+ "标题",
606
+ "1. 背景与问题定义",
607
+ "2. 技术创新点",
608
+ "3. 系统架构与关键机制",
609
+ "4. 能力与性能分析",
610
+ "5. 应用场景与价值",
611
+ "6. 局限与风险",
612
+ "7. 结论",
613
+ "8. 参考资源",
614
+ ],
615
+ "slides": [
616
+ "标题",
617
+ "背景与核心问题",
618
+ "技术创新点",
619
+ "系统架构",
620
+ "能力与性能",
621
+ "应用场景",
622
+ "局限与风险",
623
+ "结论",
624
+ "Q&A",
625
+ ],
626
+ }
627
+
628
+
629
+ def _score_pre_tex(paper: str, slides: str, sources: list[Source]) -> tuple[float, str]:
630
+ score = 1.0
631
+ must = [
632
+ "背景",
633
+ "创新",
634
+ "架构",
635
+ "应用",
636
+ "局限",
637
+ "结论",
638
+ "参考",
639
+ ]
640
+ for k in must:
641
+ if k not in paper:
642
+ score *= 0.85
643
+ if "\\documentclass" not in paper or "\\documentclass" not in slides:
644
+ score *= 0.5
645
+ if len(sources) >= 3 and paper.count("\\url{") < 3:
646
+ score *= 0.7
647
+ if "iclr2026_conference" in paper and "\\usepackage{iclr2026_conference" not in paper:
648
+ score *= 0.8
649
+ zh_chars = sum(1 for ch in (paper + slides) if "\u4e00" <= ch <= "\u9fff")
650
+ total_chars = max(1, len(paper + slides))
651
+ if zh_chars / total_chars < 0.15:
652
+ score *= 0.7
653
+ fb = "章节不足或资源映射偏弱" if score < 0.95 else "ok"
654
+ return max(0.0, min(1.0, score)), fb
655
+
656
+
657
+ def _parse_marked_tex(text: str) -> dict[str, str]:
658
+ def extract(name: str) -> str | None:
659
+ start = f"<<<{name}>>>"
660
+ end = f"<<<end {name}>>>"
661
+ a = text.find(start)
662
+ b = text.find(end)
663
+ if a == -1 or b == -1 or b <= a:
664
+ return None
665
+ inner = text[a + len(start) : b].strip()
666
+ inner = _strip_markdown_fences(inner).strip()
667
+ if inner.startswith("<latex>"):
668
+ inner = inner[len("<latex>") :].lstrip()
669
+ return inner + "\n"
670
+
671
+ out: dict[str, str] = {}
672
+ paper = extract("paper.tex")
673
+ slides = extract("slides.tex")
674
+ if paper is not None:
675
+ out["paper"] = paper
676
+ if slides is not None:
677
+ out["slides"] = slides
678
+ return out
679
+
680
+
681
+ def _strip_markdown_fences(s: str) -> str:
682
+ t = s.strip()
683
+ if t.startswith("```"):
684
+ lines = t.splitlines()
685
+ if len(lines) >= 2 and lines[-1].strip().startswith("```"):
686
+ inner = "\n".join(lines[1:-1]).strip()
687
+ return inner + "\n"
688
+ return s
689
+
690
+
691
+ def _dummy_pdf_bytes(label: str) -> bytes:
692
+ content = f"Dummy PDF ({label})".encode("ascii", errors="ignore")
693
+ return (
694
+ b"%PDF-1.1\n"
695
+ b"1 0 obj<<>>endobj\n"
696
+ b"2 0 obj<< /Length 44 >>stream\n"
697
+ b"BT /F1 12 Tf 72 720 Td ("
698
+ + content
699
+ + b") Tj ET\n"
700
+ b"endstream endobj\n"
701
+ b"3 0 obj<< /Type /Page /Parent 4 0 R /Contents 2 0 R >>endobj\n"
702
+ b"4 0 obj<< /Type /Pages /Kids [3 0 R] /Count 1 >>endobj\n"
703
+ b"5 0 obj<< /Type /Catalog /Pages 4 0 R >>endobj\n"
704
+ b"xref\n0 6\n0000000000 65535 f \n"
705
+ b"trailer<< /Root 5 0 R /Size 6 >>\nstartxref\n0\n%%EOF\n"
706
+ )
hydradeck/utils.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import datetime
4
+ import sys
5
+ import threading
6
+ import time
7
+
8
+
9
+ def log(enabled: bool, msg: str) -> None:
10
+ if not enabled:
11
+ return
12
+ ts = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds")
13
+ print(f"[{ts}] {msg}")
14
+
15
+
16
+ JSON = dict[str, object]
17
+
18
+
19
+ class Heartbeat:
20
+ def __init__(self, enabled: bool, label: str, interval_s: float = 5.0) -> None:
21
+ self._enabled = enabled
22
+ self._label = label
23
+ self._interval_s = interval_s
24
+ self._stop = threading.Event()
25
+ self._t: threading.Thread | None = None
26
+
27
+ def __enter__(self) -> Heartbeat:
28
+ if not self._enabled:
29
+ return self
30
+
31
+ def run() -> None:
32
+ start = time.time()
33
+ while not self._stop.wait(self._interval_s):
34
+ elapsed = int(time.time() - start)
35
+ sys.stderr.write(f"[heartbeat] {self._label} ({elapsed}s)\n")
36
+ sys.stderr.flush()
37
+
38
+ self._t = threading.Thread(target=run, daemon=True)
39
+ self._t.start()
40
+ return self
41
+
42
+ def __exit__(self, exc_type, exc, tb) -> None:
43
+ _ = (exc_type, exc, tb)
44
+ if not self._enabled:
45
+ return
46
+ self._stop.set()
47
+ if self._t is not None:
48
+ self._t.join(timeout=1.0)
49
+
50
+
51
+ class Progress:
52
+ def __init__(
53
+ self,
54
+ enabled: bool,
55
+ total: int,
56
+ label: str = "",
57
+ stream=None,
58
+ ) -> None:
59
+ self._enabled = enabled
60
+ self._total = max(int(total), 1)
61
+ self._label = label
62
+ self._stream = stream or sys.stderr
63
+ self._current = 0
64
+ self._last_len = 0
65
+
66
+ def update(self, step: str, inc: int = 1) -> None:
67
+ if not self._enabled:
68
+ return
69
+ self._current = min(self._total, self._current + max(int(inc), 0))
70
+ pct = int((self._current / self._total) * 100)
71
+ bar_len = 24
72
+ filled = int(bar_len * self._current / self._total)
73
+ bar = "#" * filled + "-" * (bar_len - filled)
74
+ msg = f"[progress] {self._label} [{bar}] {pct:3d}% {step}"
75
+ pad = " " * max(0, self._last_len - len(msg))
76
+ self._stream.write("\r" + msg + pad)
77
+ self._stream.flush()
78
+ self._last_len = len(msg)
79
+
80
+ def done(self, step: str = "done") -> None:
81
+ if not self._enabled:
82
+ return
83
+ self._current = self._total
84
+ self.update(step, inc=0)
85
+ self._stream.write("\n")
86
+ self._stream.flush()
pyproject.toml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "hydradeck"
7
+ version = "0.1.0"
8
+ description = "Grok-driven deep research pipeline that outputs detailed reports, speech scripts, and Beamer slides."
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "MIT" }
12
+ authors = [{ name = "hydradeck contributors" }]
13
+ dependencies = [
14
+ "requests>=2.31.0",
15
+ "urllib3>=2,<3",
16
+ "gradio>=4.44.1,<5",
17
+ "huggingface_hub<1.0",
18
+ ]
19
+
20
+ [project.optional-dependencies]
21
+ dev = [
22
+ "pytest>=8.0.0",
23
+ "ruff>=0.6.0",
24
+ ]
25
+
26
+ [project.scripts]
27
+ hydradeck = "hydradeck.cli:main"
28
+
29
+ [tool.setuptools]
30
+ package-dir = {"" = "."}
31
+
32
+ [tool.setuptools.packages.find]
33
+ where = ["."]
34
+ include = ["hydradeck*"]
35
+
36
+ [tool.setuptools.package-data]
37
+ hydradeck = ["templates/**/*"]
38
+
39
+ [tool.ruff]
40
+ line-length = 100
41
+ target-version = "py39"
42
+
43
+ [tool.ruff.lint]
44
+ select = ["E", "F", "I", "UP", "B"]
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ requests>=2.31.0
2
+ urllib3>=2,<3
3
+ gradio>=4.44.1,<5
4
+ huggingface_hub<1.0
tests/test_app_agentic.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+
5
+ import app
6
+
7
+
8
+ def test_agentic_pipeline_mock_renders_online_pdfs(monkeypatch) -> None:
9
+ def fake_compile(tex_source: str, output_name: str) -> str:
10
+ p = Path("/tmp") / output_name
11
+ p.write_bytes(b"%PDF-1.5\n%mock\n")
12
+ return str(p)
13
+
14
+ monkeypatch.setattr(app, "_compile_latex_online", fake_compile)
15
+
16
+ (
17
+ status,
18
+ progress_log,
19
+ _scope_json,
20
+ section_plan_json,
21
+ paper_tex,
22
+ slides_tex,
23
+ rendered_pdfs,
24
+ paper_pdf,
25
+ slides_pdf,
26
+ ) = (
27
+ app._run_agentic_pipeline(
28
+ topic="Agentic flow test",
29
+ model="grok-3-mini",
30
+ base_url="https://api.example.com",
31
+ api_key="",
32
+ request_budget=20,
33
+ use_mock=True,
34
+ )
35
+ )
36
+
37
+ assert "done" in status.lower()
38
+ assert "ScopeScout" in progress_log
39
+ assert "sections" in section_plan_json
40
+ assert "documentclass" in paper_tex
41
+ assert "documentclass" in slides_tex
42
+ paths = [x.strip() for x in rendered_pdfs.splitlines() if x.strip()]
43
+ assert len(paths) == 2
44
+ for p in paths:
45
+ assert Path(p).exists()
46
+ assert Path(str(paper_pdf)).exists()
47
+ assert Path(str(slides_pdf)).exists()
48
+
49
+
50
+ def test_agentic_stream_emits_progress_and_pdf_paths(monkeypatch) -> None:
51
+ def fake_compile(tex_source: str, output_name: str) -> str:
52
+ p = Path("/tmp") / output_name
53
+ p.write_bytes(b"%PDF-1.5\n%mock\n")
54
+ return str(p)
55
+
56
+ monkeypatch.setattr(app, "_compile_latex_online", fake_compile)
57
+
58
+ chunks = list(
59
+ app._run_agentic_pipeline_stream(
60
+ topic="Agentic stream test",
61
+ model="grok-3-mini",
62
+ base_url="https://api.example.com",
63
+ api_key="",
64
+ request_budget=20,
65
+ use_mock=True,
66
+ )
67
+ )
68
+ assert len(chunks) >= 3
69
+ assert chunks[0][-1] == 5
70
+ assert chunks[1][-1] == 30
71
+ assert chunks[-1][-1] == 100
72
+ assert "done" in str(chunks[-1][0]).lower()
73
+ assert Path(str(chunks[-1][7])).exists()
74
+ assert Path(str(chunks[-1][8])).exists()
tests/test_cli.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import pytest
4
+
5
+ from hydradeck import cli
6
+ from hydradeck.core.types import RunConfig
7
+
8
+
9
+ def test_run_command_accepts_snapshot_total_timeout_default(
10
+ monkeypatch: pytest.MonkeyPatch,
11
+ ) -> None:
12
+ captured: dict[str, float] = {}
13
+
14
+ def fake_run(cfg: RunConfig) -> object:
15
+ captured["snapshot_total_timeout_s"] = cfg.snapshot_total_timeout_s
16
+ return object()
17
+
18
+ monkeypatch.setattr(cli, "run", fake_run)
19
+
20
+ code = cli.main(
21
+ [
22
+ "run",
23
+ "--topic",
24
+ "t",
25
+ "--out",
26
+ "out.zip",
27
+ "--base-url",
28
+ "https://example.invalid",
29
+ "--model",
30
+ "mock",
31
+ "--mock",
32
+ ]
33
+ )
34
+
35
+ assert code == 0
36
+ assert captured["snapshot_total_timeout_s"] == 60.0
37
+
38
+
39
+ def test_run_command_passes_request_budget(monkeypatch: pytest.MonkeyPatch) -> None:
40
+ captured: dict[str, float] = {}
41
+
42
+ def fake_run(cfg: RunConfig) -> object:
43
+ captured["request_budget_s"] = cfg.request_budget_s
44
+ return object()
45
+
46
+ monkeypatch.setattr(cli, "run", fake_run)
47
+
48
+ code = cli.main(
49
+ [
50
+ "run",
51
+ "--topic",
52
+ "t",
53
+ "--out",
54
+ "out.zip",
55
+ "--base-url",
56
+ "https://example.invalid",
57
+ "--model",
58
+ "mock",
59
+ "--request-budget",
60
+ "90",
61
+ "--mock",
62
+ ]
63
+ )
64
+
65
+ assert code == 0
66
+ assert captured["request_budget_s"] == 90.0
tests/test_config.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+
5
+ from hydradeck.config import UserConfig, load_config, load_merged_config, save_config
6
+
7
+
8
+ def test_save_and_load_config(tmp_path: Path) -> None:
9
+ p = tmp_path / "cfg.json"
10
+ save_config(
11
+ UserConfig(
12
+ base_url="https://x",
13
+ api_key="k",
14
+ model="m",
15
+ pdf_compiler="auto",
16
+ template="iclr2026",
17
+ ),
18
+ path=p,
19
+ )
20
+ cfg = load_config(path=p)
21
+ assert cfg.base_url == "https://x"
22
+ assert cfg.api_key == "k"
23
+ assert cfg.model == "m"
24
+ assert cfg.pdf_compiler == "auto"
25
+ assert cfg.template == "iclr2026"
26
+
27
+
28
+ def test_project_config_overrides_user(tmp_path: Path, monkeypatch) -> None:
29
+ user_p = tmp_path / "user.json"
30
+ save_config(UserConfig(base_url="https://u", api_key="u", model="u"), path=user_p)
31
+
32
+ proj_root = tmp_path / "proj"
33
+ (proj_root / ".hydradeck").mkdir(parents=True)
34
+ proj_p = proj_root / ".hydradeck" / "config.json"
35
+ save_config(UserConfig(model="p"), path=proj_p)
36
+
37
+ monkeypatch.chdir(proj_root)
38
+ from hydradeck import config as cfgmod
39
+
40
+ monkeypatch.setattr(cfgmod, "config_path", lambda: user_p)
41
+ merged = load_merged_config()
42
+ assert merged.base_url == "https://u"
43
+ assert merged.api_key == "u"
44
+ assert merged.model == "p"
tests/test_preset_pre.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import zipfile
4
+ from pathlib import Path
5
+
6
+
7
+ def test_preset_rynnbrain_zip(tmp_path: Path) -> None:
8
+ from hydradeck.presets.rynnbrain import generate
9
+
10
+ out_zip = tmp_path / "rynnbrain_pre.zip"
11
+ generate(out=out_zip, keep_stage=False, fetch=False)
12
+ assert out_zip.exists()
13
+ with zipfile.ZipFile(out_zip, "r") as z:
14
+ names = set(z.namelist())
15
+ assert "pre_report.md" in names
16
+ assert "research.json" in names
17
+ assert "resources/sources.json" in names
tests/test_render.py ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from hydradeck.core.types import ExtractedFact, Source
4
+ from hydradeck.render import (
5
+ build_slide_frames_from_sections,
6
+ build_slide_frames_from_report,
7
+ enforce_slide_density,
8
+ render_beamer_frames,
9
+ render_paper,
10
+ render_report_structured,
11
+ )
12
+
13
+
14
+ def test_render_paper_converts_markdown_like_body() -> None:
15
+ sources = [Source(url="https://example.com", title="Example", snippet="snippet")]
16
+ facts = [
17
+ ExtractedFact(
18
+ claim="Claim A",
19
+ evidence="Evidence A",
20
+ url="https://example.com",
21
+ title="Example",
22
+ )
23
+ ]
24
+ body = """## Heading
25
+ - bullet 1
26
+ - bullet 2
27
+ `inline`
28
+ ```python
29
+ print('x')
30
+ ```
31
+ """
32
+
33
+ tex = render_paper(
34
+ topic="demo",
35
+ outline=["背景", "创新"],
36
+ body=body,
37
+ facts=facts,
38
+ sources=sources,
39
+ )
40
+
41
+ assert "```" not in tex
42
+ assert "## Heading" not in tex
43
+ assert "Heading" in tex
44
+ assert "\\begin{itemize}" in tex
45
+ assert "bullet 1" in tex
46
+ assert "\\section*{1. Introduction and Background}" in tex
47
+
48
+
49
+ def test_render_templates_use_facts_not_generic_filler() -> None:
50
+ sources = [Source(url="https://example.com", title="Example", snippet="snippet")]
51
+ facts = [
52
+ ExtractedFact(
53
+ claim="RynnBrain released checkpoints on 2026-02-09",
54
+ evidence="project timeline from official repo",
55
+ url="https://example.com",
56
+ title="Example",
57
+ ),
58
+ ExtractedFact(
59
+ claim="Model introduces interleaved reasoning with spatial grounding",
60
+ evidence="technical report description",
61
+ url="https://example.com",
62
+ title="Example",
63
+ ),
64
+ ]
65
+ paper = render_paper(
66
+ topic="demo",
67
+ outline=["背景", "创新", "架构"],
68
+ body="结论段落 [1]",
69
+ facts=facts,
70
+ sources=sources,
71
+ )
72
+ section_blocks = [
73
+ {"name": "背景", "latex": facts[0].claim},
74
+ {"name": "创新", "latex": facts[1].claim},
75
+ ]
76
+ frames = build_slide_frames_from_sections(section_blocks, language="en")
77
+ slides = render_beamer_frames("demo", frames, language="en")
78
+
79
+ assert "released checkpoints" in paper
80
+ assert "interleaved reasoning" in paper
81
+ assert "released checkpoints" in slides
82
+ assert "interleaved reasoning" in slides
83
+ section_one = paper.split("\\section*{1. Introduction and Background}", 1)[1]
84
+ section_one = section_one.split("\\section*{2. Logical Outline}", 1)[0]
85
+ assert "\\begin{itemize}" not in section_one
86
+
87
+
88
+ def test_render_beamer_from_report_derives_outline() -> None:
89
+ paper = (
90
+ "\\documentclass{article}\n"
91
+ "\\begin{document}\n"
92
+ "\\section*{Executive Summary}\nAlpha beta gamma.\n\n"
93
+ "\\section*{Methodology}\nMethod details.\n\n"
94
+ "\\section*{Results}\nResult details.\n"
95
+ "\\end{document}\n"
96
+ )
97
+ frames = build_slide_frames_from_report(paper, language="en")
98
+ slides = render_beamer_frames("demo", frames, language="en")
99
+ assert "\\begin{frame}{Agenda}" in slides
100
+ assert "Executive Summary" in slides
101
+ assert "Methodology" in slides
102
+
103
+
104
+ def test_render_beamer_frames_limits_density() -> None:
105
+ report = (
106
+ "\\documentclass{article}\\begin{document}"
107
+ "\\section*{Overview} A long sentence about architecture and implementation details repeated."
108
+ " Another long sentence about evaluation metrics and reproducibility details."
109
+ "\\section*{Results} Multiple findings with evidence and quantitative metrics."
110
+ "\\end{document}"
111
+ )
112
+ frames = build_slide_frames_from_report(report, language="en")
113
+ assert len(frames) >= 2
114
+ tex = render_beamer_frames("demo", frames, language="en")
115
+ assert "\\begin{frame}{Agenda}" in tex
116
+ assert "\\begin{itemize}" in tex
117
+
118
+
119
+ def test_render_report_structured_zh_uses_ctex() -> None:
120
+ section_blocks = [
121
+ {"name": "方法", "latex": "本节给出方法细节与参数说明。"},
122
+ {"name": "结果", "latex": "本节给出结果与证据。"},
123
+ ]
124
+ tex = render_report_structured("中文研究报告", section_blocks, language="zh")
125
+ assert "\\documentclass[11pt]{ctexart}" in tex
126
+ assert "本研究报告聚焦可追溯证据" not in tex
127
+ assert "建议定期刷新证据并进行复跑验证" not in tex
128
+ assert "\\section*{方法}" in tex
129
+
130
+
131
+ def test_render_beamer_frames_zh_uses_ctexbeamer() -> None:
132
+ frames = [
133
+ build_slide_frames_from_report(
134
+ "\\documentclass{ctexart}\\begin{document}\\section*{结果}关键结果一。关键结果二。\\end{document}",
135
+ language="zh",
136
+ )[0]
137
+ ]
138
+ tex = render_beamer_frames("中文主题", frames, language="zh")
139
+ assert "\\documentclass[aspectratio=169]{ctexbeamer}" in tex
140
+ assert "\\begin{frame}{目录}" in tex
141
+
142
+
143
+ def test_build_slide_frames_from_sections_splits_long_section() -> None:
144
+ section_blocks = [
145
+ {
146
+ "name": "Results",
147
+ "latex": (
148
+ "The first finding shows strong improvement in consistency and precision. "
149
+ "The second finding shows stronger robustness under distribution shift. "
150
+ "The third finding indicates cost-performance improvement. "
151
+ "The fourth finding confirms stability across runs. "
152
+ "The fifth finding highlights limitations and guardrails."
153
+ ),
154
+ }
155
+ ]
156
+ frames = build_slide_frames_from_sections(section_blocks, language="en")
157
+ assert len(frames) >= 2
158
+
159
+
160
+ def test_render_report_structured_removes_bracket_refs() -> None:
161
+ section_blocks = [
162
+ {"name": "Evidence", "latex": "Claim [1] with support [2] and \\cite{src1}."}
163
+ ]
164
+ tex = render_report_structured("demo", section_blocks, language="en")
165
+ assert "[1]" not in tex
166
+ assert "[2]" not in tex
167
+ assert "\\cite{" not in tex
168
+
169
+
170
+ def test_enforce_slide_density_splits_large_bullet_groups() -> None:
171
+ frames = [
172
+ {
173
+ "title": "Results",
174
+ "bullets": [
175
+ "point one with enough words to be valid",
176
+ "point two with enough words to be valid",
177
+ "point three with enough words to be valid",
178
+ "point four with enough words to be valid",
179
+ "point five with enough words to be valid",
180
+ ],
181
+ }
182
+ ]
183
+ from hydradeck.render import SlideFrame
184
+
185
+ fr = [SlideFrame(title=x["title"], bullets=x["bullets"]) for x in frames]
186
+ out = enforce_slide_density(fr, language="en", max_bullets_per_frame=4)
187
+ assert len(out) == 2
188
+ assert out[0].title == "Results"
189
+ assert "(cont.)" in out[1].title
tests/test_resources_pack_mock.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import zipfile
4
+ from pathlib import Path
5
+
6
+ from hydradeck.core.types import RunConfig
7
+ from hydradeck.resources_pack import build_resources_pack
8
+
9
+
10
+ def test_resources_pack_mock(tmp_path: Path) -> None:
11
+ out_zip = tmp_path / "res.zip"
12
+ cfg = RunConfig(
13
+ topic="RynnBrain",
14
+ out=out_zip,
15
+ base_url="https://example.invalid",
16
+ api_key="",
17
+ model="mock",
18
+ use_mock=True,
19
+ verbose=False,
20
+ progress=False,
21
+ llm_timeout_s=5.0,
22
+ max_total_runtime_s=5.0,
23
+ request_budget_s=2.0,
24
+ snapshot_timeout_s=1.0,
25
+ keep_stage=False,
26
+ max_sources=3,
27
+ module_sources=2,
28
+ )
29
+ build_resources_pack(cfg)
30
+ assert out_zip.exists()
31
+ with zipfile.ZipFile(out_zip, "r") as z:
32
+ names = set(z.namelist())
33
+ pre_paper = z.read("pre_paper.tex").decode("utf-8")
34
+ pre_slides = z.read("pre_slides.tex").decode("utf-8")
35
+ assert "resources/sources.json" in names
36
+ assert "resources/snapshots.json" in names
37
+ assert "research.json" in names
38
+ assert "pre_paper.tex" in names
39
+ assert "pre_slides.tex" in names
40
+ assert "pdf/pre_paper.pdf" in names
41
+ assert "pdf/pre_slides.pdf" in names
42
+ assert "来源清单" in pre_paper
43
+ assert "关键来源" in pre_slides
tests/test_smoke_mock.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import zipfile
4
+ from pathlib import Path
5
+
6
+ from hydradeck.core.types import RunConfig
7
+ from hydradeck.pipeline import run
8
+
9
+
10
+ def test_mock_run_creates_zip(tmp_path: Path) -> None:
11
+ out_zip = tmp_path / "demo.zip"
12
+ cfg = RunConfig(
13
+ topic="test topic",
14
+ out=out_zip,
15
+ base_url="https://example.invalid",
16
+ api_key="",
17
+ model="mock",
18
+ use_mock=True,
19
+ verbose=False,
20
+ iterations=2,
21
+ max_sources=3,
22
+ archive_snapshots=False,
23
+ auto=True,
24
+ auto_queries=True,
25
+ auto_models=True,
26
+ )
27
+ run(cfg)
28
+ assert out_zip.exists()
29
+ with zipfile.ZipFile(out_zip, "r") as z:
30
+ names = set(z.namelist())
31
+ compile_sh = z.read("compile.sh").decode("utf-8")
32
+ paper_tex = z.read("paper.tex").decode("utf-8")
33
+ slides_tex = z.read("slides.tex").decode("utf-8")
34
+ for required in [
35
+ "pre_report.md",
36
+ "report.md",
37
+ "speech.md",
38
+ "paper.tex",
39
+ "slides.tex",
40
+ "refs.bib",
41
+ "research.json",
42
+ "compile.sh",
43
+ "Makefile",
44
+ "resources/sources.json",
45
+ ]:
46
+ assert required in names
47
+
48
+ assert "xelatex -interaction=nonstopmode paper.tex" in compile_sh
49
+ assert "xelatex -interaction=nonstopmode slides.tex" in compile_sh
50
+ assert "\\section*{3. Evidence and Key Findings}" in paper_tex
51
+ assert "\\section*{1. Introduction and Background}" in paper_tex
52
+ assert "\\begin{frame}{Agenda}" in slides_tex
53
+ assert "\\usetheme{metropolis}" in slides_tex
54
+ assert "```" not in paper_tex
55
+ assert "```" not in slides_tex
56
+ assert "## " not in paper_tex
57
+ assert "## " not in slides_tex
tests/test_verbatim_mock.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import zipfile
4
+ from pathlib import Path
5
+
6
+ from hydradeck.core.types import RunConfig
7
+ from hydradeck.pipeline import run
8
+
9
+
10
+ def test_mock_run_verbatim_creates_outputs(tmp_path: Path) -> None:
11
+ out_zip = tmp_path / "verbatim.zip"
12
+ cfg = RunConfig(
13
+ topic="RynnBrain",
14
+ out=out_zip,
15
+ base_url="https://example.invalid",
16
+ api_key="",
17
+ model="mock",
18
+ use_mock=True,
19
+ verbose=False,
20
+ iterations=1,
21
+ max_sources=3,
22
+ verbatim=True,
23
+ archive_prompts=True,
24
+ archive_snapshots=False,
25
+ quality_gate=True,
26
+ min_quality_score=0.85,
27
+ max_quality_attempts=2,
28
+ )
29
+ run(cfg)
30
+ with zipfile.ZipFile(out_zip, "r") as z:
31
+ names = set(z.namelist())
32
+ for required in [
33
+ "pre_report.md",
34
+ "report.md",
35
+ "speech.md",
36
+ "paper.tex",
37
+ "slides.tex",
38
+ "refs.bib",
39
+ "research.json",
40
+ "resources/sources.json",
41
+ "prompts.jsonl",
42
+ ]:
43
+ assert required in names