Spaces:

OpeneR
/

HydraDeck

Sleeping

OpeneR Sisyphus commited on Mar 4

Commit

778278c

0 Parent(s):

HydraDeck open-source clean snapshot

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Files changed (29) hide show

.gitignore +8 -0
README.md +77 -0
README_SPACES.md +20 -0
app.py +1269 -0
custom_web.py +547 -0
hydradeck/__init__.py +3 -0
hydradeck/agents/personas.py +98 -0
hydradeck/cli.py +522 -0
hydradeck/clients/__init__.py +3 -0
hydradeck/clients/grok_client.py +373 -0
hydradeck/config.py +137 -0
hydradeck/core/types.py +91 -0
hydradeck/packaging.py +33 -0
hydradeck/pipeline.py +884 -0
hydradeck/presets/__init__.py +3 -0
hydradeck/presets/rynnbrain.py +346 -0
hydradeck/render.py +471 -0
hydradeck/resources_pack.py +706 -0
hydradeck/utils.py +86 -0
pyproject.toml +44 -0
requirements.txt +4 -0
tests/test_app_agentic.py +74 -0
tests/test_cli.py +66 -0
tests/test_config.py +44 -0
tests/test_preset_pre.py +17 -0
tests/test_render.py +189 -0
tests/test_resources_pack_mock.py +43 -0
tests/test_smoke_mock.py +57 -0
tests/test_verbatim_mock.py +43 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+__pycache__/
+*.pyc
+.pytest_cache/
+.ruff_cache/
+.DS_Store
+build/
+*.egg-info/
+out/

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+# hydradeck
+一个可重复、可审计的 Grok Deep Research 流水线（多 Persona 迭代），输出：
+- `pre_report.md`：Pre-Research（研究前置）报告：研究问题拆解、方法、检索策略、风险与边界
+- `report.md`：完整研究报告（含“完整资源”列表与可追溯引用）
+- `speech.md`：演讲稿（可直接照读，含转场与时间提示）
+- `pre_paper.tex`：Pre-brief 的 LaTeX 论文稿（article）
+- `pre_slides.tex`：Pre-brief 的 Beamer 幻灯片
+- `refs.bib`：BibTeX 参考文献
+- `research.json`：结构化中间产物（便于复现与审计）
+> 安全提示：不要把 API Key 写进仓库。请使用环境变量 `GROK_API_KEY`。
+> 如果你已经在聊天里粘贴过 key，请立即**轮换/作废**该 key。
+## 安装
+```bash
+cd hydradeck
+python3 -m pip install -e .
+python3 -m pip install -e ".[dev]"
+```
+## 快速使用
+### 1) Mock（离线）跑通流程
+```bash
+mkdir -p out
+hydradeck run --topic "LLM agents for deep research" --out out/demo.zip --mock
+```
+### 2) 使用 Grok2API / OpenAI 兼容网关
+`api.example.com` 基于 Grok2API，提供 OpenAI 兼容的 `/v1/chat/completions` 与 `/v1/models`。
+```bash
+export GROK_BASE_URL="https://api.example.com"
+export GROK_API_KEY="<YOUR_KEY>"
+export GROK_MODEL="grok-4"
+mkdir -p out
+hydradeck run --topic "<你的研究主题>" --out out/topic.zip \
+  --iterations 3 \
+  --max-sources 10
+```
+## 输出结构
+输出为一个目录或 zip（取决于 `--out` 是否以 `.zip` 结尾）。其中包含 `compile.sh` 与 `Makefile` 便于编译 LaTeX。
+## WebUI（HydraDeck）
+### 启动方式（本地）
+```bash
+cd hydradeck
+python3 custom_web.py
+```
+默认监听：`http://127.0.0.1:7861`
+### 运行前环境变量（可选）
+```bash
+export GROK_BASE_URL="https://api.example.com"
+export GROK_API_KEY="<YOUR_KEY>"
+export GROK_MODEL="grok-4"
+```
+### 页面基本使用
+1. 在 `Run` 标签填写 Topic
+2. 点 `Quick API Check` 先检查连通性
+3. 点 `Run HydraDeck` 开始生成
+4. 在 `Console` 查看实时进度
+5. 在 `Artifacts` 下载 `paper.pdf` / `slides.pdf`

README_SPACES.md ADDED Viewed

	@@ -0,0 +1,20 @@

+---
+title: hydradeck-webui
+emoji: 📚
+colorFrom: indigo
+colorTo: blue
+sdk: gradio
+sdk_version: 4.44.1
+app_file: app.py
+pinned: false
+---
+# hydradeck WebUI (Hugging Face Spaces)
+Set these secrets in Space settings if needed:
+- `GROK_API_KEY`
+- `GROK_BASE_URL` (optional, defaults to `https://api.example.com`)
+- `GROK_MODEL` (optional, defaults to `grok-4`)
+The app entrypoint is `app.py`.

app.py ADDED Viewed

	@@ -0,0 +1,1269 @@

+from __future__ import annotations
+import warnings
+warnings.filterwarnings(
+    "ignore",
+    message=r"urllib3 v2 only supports OpenSSL 1\.1\.1\+.*",
+)
+import tempfile
+import zipfile
+import json
+import time
+from concurrent.futures import ThreadPoolExecutor
+from queue import Empty, Queue
+from pathlib import Path
+from typing import Any
+from urllib.error import HTTPError, URLError
+from urllib.parse import quote, urlparse
+from urllib.request import Request, urlopen
+import gradio as gr
+from hydradeck.clients import ChatMessage, GrokClient
+from hydradeck.config import resolve_api_key, resolve_base_url, resolve_model
+from hydradeck.core.types import RunConfig
+from hydradeck.pipeline import run
+from hydradeck.render import (
+    build_slide_frames_from_sections,
+    enforce_slide_density,
+    render_beamer_frames,
+    render_paper,
+    render_report_structured,
+)
+CHROME_144_UA = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/144.0.0.0 Safari/537.36"
+)
+def _normalized_base_url(base_url: str) -> str:
+    parsed = urlparse(base_url.strip())
+    if parsed.scheme not in {"http", "https"}:
+        raise ValueError("Base URL must start with http:// or https://")
+    if not parsed.netloc:
+        raise ValueError("Base URL is missing host")
+    return base_url.strip().rstrip("/")
+def _preflight_check(base_url: str, api_key: str, request_budget: float) -> str | None:
+    if not api_key.strip():
+        return "Missing API key. Fill API Key field or set GROK_API_KEY before running."
+    try:
+        normalized = _normalized_base_url(base_url)
+    except ValueError as exc:
+        return f"Invalid Base URL: {exc}"
+    probe_url = f"{normalized}/v1/models"
+    timeout_s = max(2.0, min(float(request_budget), 6.0))
+    req = Request(
+        probe_url,
+        headers={
+            "Authorization": f"Bearer {api_key.strip()}",
+            "User-Agent": CHROME_144_UA,
+        },
+    )
+    try:
+        with urlopen(req, timeout=timeout_s):
+            return None
+    except HTTPError as exc:
+        try:
+            body = exc.read().decode("utf-8", errors="replace")
+        except Exception:
+            body = ""
+        if exc.code == 403 and "error code: 1010" in body.lower():
+            return (
+                "Gateway blocked this client (Cloudflare 1010), not an API-key issue. "
+                "Try another network/egress IP or ask gateway admin to allow this IP."
+            )
+        if exc.code in {401, 403}:
+            return "API key rejected (401/403). Please update GROK_API_KEY or paste a valid key."
+        return f"API endpoint returned HTTP {exc.code} during preflight."
+    except URLError as exc:
+        return f"Cannot reach API endpoint ({probe_url}): {exc.reason}"
+    except TimeoutError:
+        return (
+            f"API preflight timed out after {timeout_s:.0f}s. "
+            "Try mock mode first, then increase Request budget."
+        )
+def _api_quick_check(base_url: str, api_key: str, model: str, request_budget: float) -> str:
+    selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
+    selected_api_key = api_key.strip() or resolve_api_key()
+    preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
+    if preflight_error is not None:
+        return f"API check failed: {preflight_error}"
+    normalized = _normalized_base_url(selected_base_url)
+    req_model = model.strip() or resolve_model("grok-3-mini")
+    payload = {
+        "model": req_model,
+        "messages": [{"role": "user", "content": "reply with exactly: API_OK"}],
+        "temperature": 0,
+        "max_tokens": 8,
+    }
+    req = Request(
+        f"{normalized}/v1/chat/completions",
+        method="POST",
+        data=json.dumps(payload).encode("utf-8"),
+        headers={
+            "Authorization": f"Bearer {selected_api_key.strip()}",
+            "User-Agent": CHROME_144_UA,
+            "Content-Type": "application/json",
+        },
+    )
+    timeout_s = max(3.0, min(float(request_budget), 12.0))
+    try:
+        with urlopen(req, timeout=timeout_s) as resp:
+            body = resp.read().decode("utf-8", errors="replace")
+    except HTTPError as exc:
+        text = exc.read().decode("utf-8", errors="replace")
+        return f"API check failed: HTTP {exc.code} {text[:180]}"
+    except URLError as exc:
+        return f"API check failed: network error {exc.reason}"
+    except TimeoutError:
+        return f"API check failed: completion timeout after {timeout_s:.0f}s"
+    if "API_OK" not in body:
+        return f"API check uncertain: completion returned unexpected body: {body[:180]}"
+    return "API check passed: models/completions reachable and auth works."
+def _compile_latex_online(tex_source: str, output_name: str) -> str:
+    def _compile_via_hosted_url(command: str) -> bytes:
+        upload_req = Request("https://paste.rs", data=tex_source.encode("utf-8"), method="POST")
+        with urlopen(upload_req, timeout=30) as upload_resp:
+            hosted_url = upload_resp.read().decode("utf-8", errors="replace").strip()
+        compile_from_url = (
+            "https://latexonline.cc/compile?url="
+            + quote(hosted_url, safe=":/?=&")
+            + "&command="
+            + command
+            + "&force=true"
+        )
+        req2 = Request(compile_from_url, headers={"User-Agent": CHROME_144_UA})
+        with urlopen(req2, timeout=120) as resp2:
+            return resp2.read()
+    errors: list[str] = []
+    blob = b""
+    for command in ["xelatex", "lualatex", "pdflatex"]:
+        try:
+            encoded = quote(tex_source, safe="")
+            compile_url = (
+                "https://latexonline.cc/compile?text="
+                + encoded
+                + "&command="
+                + command
+                + "&force=true"
+            )
+            if len(compile_url) > 6000:
+                blob = _compile_via_hosted_url(command)
+            else:
+                req = Request(compile_url, headers={"User-Agent": CHROME_144_UA})
+                with urlopen(req, timeout=90) as resp:
+                    blob = resp.read()
+            if blob.startswith(b"%PDF"):
+                break
+            blob = _compile_via_hosted_url(command)
+            if blob.startswith(b"%PDF"):
+                break
+            errors.append(f"{command}: non-pdf response")
+        except HTTPError as exc:
+            body = exc.read().decode("utf-8", errors="replace")
+            errors.append(f"{command}: HTTP {exc.code} {body[:500]}")
+        except Exception as exc:
+            errors.append(f"{command}: {exc}")
+    if not blob.startswith(b"%PDF"):
+        raise RuntimeError("online renderer failed: " + " | ".join(errors[:3]))
+    out_path = Path("/tmp") / output_name
+    _ = out_path.write_bytes(blob)
+    return str(out_path)
+def _extract_json_object(text: str) -> dict[str, Any]:
+    raw = text.strip()
+    if not raw:
+        raise RuntimeError("empty JSON response")
+    try:
+        parsed = json.loads(raw)
+        if isinstance(parsed, dict):
+            return parsed
+    except json.JSONDecodeError:
+        pass
+    start = raw.find("{")
+    end = raw.rfind("}")
+    if start == -1 or end == -1 or end <= start:
+        raise RuntimeError("no JSON object found in response")
+    parsed2 = json.loads(raw[start : end + 1])
+    if not isinstance(parsed2, dict):
+        raise RuntimeError("top-level JSON is not an object")
+    return parsed2
+def _chat_json_resilient(
+    client: GrokClient,
+    messages: list[ChatMessage],
+    schema_hint: str,
+    temperature: float,
+    timeout_s: float,
+) -> dict[str, Any]:
+    try:
+        obj = client.chat_json(
+            messages,
+            schema_hint=schema_hint,
+            temperature=temperature,
+            timeout_s=timeout_s,
+        )
+        if isinstance(obj, dict):
+            return obj
+    except Exception:
+        pass
+    try:
+        text = client.chat_text(messages, temperature=temperature, timeout_s=timeout_s)
+        return _extract_json_object(text)
+    except Exception:
+        return {}
+def _build_stage_model_map(
+    requested_model: str,
+    overrides: dict[str, str] | None = None,
+) -> dict[str, str]:
+    base = requested_model.strip() or resolve_model("grok-3-mini")
+    high = base
+    if "mini" in base:
+        high = base.replace("-mini", "")
+    if high == base and base == "grok-3-mini":
+        high = "grok-3"
+    model_map = {
+        "scope": base,
+        "structure": high,
+        "planner": high,
+        "section": base,
+        "paper": high,
+        "slides": high,
+    }
+    if overrides:
+        for key in model_map:
+            v = overrides.get(key, "").strip()
+            if v:
+                model_map[key] = v
+    return model_map
+def _looks_like_template_text(text: str) -> bool:
+    low = text.lower().strip()
+    if not low:
+        return True
+    bad_markers = [
+        "this section is generated",
+        "no content generated",
+        "lorem ipsum",
+        "to be filled",
+        "placeholder",
+        "add key evidence-backed findings",
+        "补充关键事实与证据",
+    ]
+    return any(m in low for m in bad_markers)
+def _assert_not_template_output(module_name: str, text: str) -> None:
+    if _looks_like_template_text(text):
+        raise RuntimeError(f"{module_name} produced template-like content; retry required")
+def _section_quality_ok(section_title: str, latex_body: str, language: str) -> bool:
+    if _looks_like_template_text(latex_body):
+        return False
+    body = latex_body.strip()
+    if len(body) < 120:
+        return False
+    if language == "zh":
+        zh_chars = sum(1 for ch in body if "\u4e00" <= ch <= "\u9fff")
+        if zh_chars < 20:
+            return False
+    else:
+        words = [w for w in body.replace("\n", " ").split(" ") if w]
+        if len(words) < 40:
+            return False
+    _ = section_title
+    return True
+def _run_agentic_pipeline(
+    topic: str,
+    model: str,
+    base_url: str,
+    api_key: str,
+    request_budget: float,
+    use_mock: bool,
+    progress: gr.Progress = gr.Progress(),
+    stage_callback=None,
+    language: str = "en",
+    stage_models: dict[str, str] | None = None,
+) -> tuple[str, str, str, str, str, str, str, str, str]:
+    if not topic.strip():
+        return "Topic is required.", "", "", "", "", "", "", "", ""
+    selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
+    selected_api_key = api_key.strip() or resolve_api_key()
+    selected_model = model.strip() or resolve_model("grok-3-mini")
+    lang = language.strip().lower()
+    if lang not in {"en", "zh"}:
+        lang = "en"
+    model_map = _build_stage_model_map(selected_model, overrides=stage_models)
+    total_steps = 9
+    stage_logs: list[str] = []
+    def mark(step: int, label: str, detail: str) -> None:
+        pct = min(max(step / total_steps, 0.0), 1.0)
+        _ = progress(pct, desc=label)
+        stage_logs.append(f"{step}/{total_steps} {label}: {detail}")
+    def emit_stage(
+        step: int,
+        label: str,
+        detail: str,
+        scope_text: str = "",
+        section_text: str = "",
+        paper_text: str = "",
+        slides_text: str = "",
+        pdf_paths_text: str = "",
+        paper_pdf_text: str = "",
+        slides_pdf_text: str = "",
+    ) -> None:
+        if stage_callback is None:
+            return
+        payload = {
+            "status": f"Running: {label}",
+            "progress_log": "\n".join(stage_logs),
+            "scope": scope_text,
+            "sections": section_text,
+            "paper": paper_text,
+            "slides": slides_text,
+            "pdf_paths": pdf_paths_text,
+            "paper_pdf": paper_pdf_text,
+            "slides_pdf": slides_pdf_text,
+            "progress": int(min(100, max(0, round(step / total_steps * 100)))),
+            "stage": label,
+            "detail": detail,
+        }
+        stage_callback(payload)
+    mark(1, "Preflight", "checking API connectivity")
+    emit_stage(1, "Preflight", "checking API connectivity")
+    if not use_mock:
+        preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
+        if preflight_error is not None:
+            return (
+                f"Agentic run failed: {preflight_error}",
+                "\n".join(stage_logs),
+                "",
+                "",
+                "",
+                "",
+                "",
+                "",
+                "",
+            )
+    scope_payload: dict[str, object]
+    section_plan: list[dict[str, str]]
+    section_blocks: list[dict[str, str]] = []
+    paper_tex = ""
+    slides_tex = ""
+    if use_mock:
+        mark(2, "Agent-1 ScopeScout", "using mock scope")
+        scope_payload = {
+            "project_links": [
+                {
+                    "title": "RynnBrain repo",
+                    "url": "https://github.com/alibaba-damo-academy/RynnBrain",
+                    "reason": "Core project artifact",
+                },
+                {
+                    "title": "arXiv references",
+                    "url": "https://arxiv.org",
+                    "reason": "Peer-reviewed baseline papers",
+                },
+            ],
+            "scope": {
+                "in_scope": ["architecture", "training/inference workflow", "evaluation evidence"],
+                "out_scope": ["business roadmap", "non-technical marketing claims"],
+                "key_questions": [
+                    "What problem is solved?",
+                    "What architecture choices matter?",
+                    "What evidence supports claims?",
+                ],
+            },
+        }
+        emit_stage(
+            2,
+            "Agent-1 ScopeScout",
+            "scope resolved",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+        )
+        mark(3, "Agent-StructureDesigner", "designing report structure")
+        structure_plan = {
+            "title": topic.strip(),
+            "sections": [
+                {"name": "Abstract", "goal": "State problem, method, key findings, and significance."},
+                {"name": "Introduction", "goal": "Context, motivation, and clear research question."},
+                {"name": "Methodology", "goal": "System design, assumptions, and evaluation protocol."},
+                {"name": "Results", "goal": "Evidence-backed findings with explicit source links."},
+                {"name": "Discussion", "goal": "Interpretation, limitations, and trade-offs."},
+                {"name": "Conclusion", "goal": "Takeaways and future work."},
+            ],
+            "slide_style": {
+                "max_bullets": 5,
+                "max_words_per_bullet": 14,
+                "visual_density": "low",
+                "must_include": ["agenda", "method diagram slide", "results table slide", "limitations"],
+            },
+        }
+        emit_stage(
+            3,
+            "Agent-StructureDesigner",
+            "report structure designed",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps(structure_plan, ensure_ascii=False, indent=2),
+        )
+        mark(4, "Agent-2 TemplatePlanner", "building section summaries from templates")
+        section_plan = [
+            {"name": "Abstract", "summary": "Concise summary of problem, method, findings, and impact."},
+            {"name": "Introduction", "summary": "Problem framing and motivation in research context."},
+            {"name": "Methodology", "summary": "System architecture and methodological decisions."},
+            {"name": "Results", "summary": "Empirical findings and traceable evidence."},
+            {"name": "Discussion", "summary": "Interpretation of findings and practical implications."},
+            {"name": "Conclusion", "summary": "Actionable takeaways and next steps."},
+        ]
+        if lang == "zh":
+            section_plan = [
+                {"name": "摘要", "summary": "概述研究问题、方法、关键发现与价值。"},
+                {"name": "引言", "summary": "说明背景、动机与研究问题。"},
+                {"name": "方法", "summary": "阐述系统架构、方法流程与评估设置。"},
+                {"name": "结果", "summary": "给出可追溯证据支持的核心结论。"},
+                {"name": "讨论", "summary": "解释结果意义、局限与适用边界。"},
+                {"name": "结论", "summary": "总结与后续研究建议。"},
+            ]
+        emit_stage(
+            4,
+            "Agent-2 TemplatePlanner",
+            "section plan prepared",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+        )
+        mark(5, "Section Agents", "drafting per-section TeX blocks")
+        for sec in section_plan:
+            section_blocks.append(
+                {
+                    "name": sec["name"],
+                    "latex": (
+                        f"\\subsection*{{{sec['name']}}}\n"
+                        f"{sec['summary']}\\\n"
+                        "Evidence should map directly to claims and include method-specific details."
+                    ),
+                }
+            )
+        emit_stage(
+            5,
+            "Section Agents",
+            "section drafts ready",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+            paper_text="\n\n".join(block["latex"] for block in section_blocks),
+        )
+        mark(6, "Integrator-Paper", "merging section TeX into paper")
+        paper_tex = render_report_structured(topic.strip(), section_blocks, language=lang)
+        mark(7, "Integrator-Beamer", "building slide deck from report")
+        frames = build_slide_frames_from_sections(section_blocks, language=lang)
+        frames = enforce_slide_density(frames, language=lang)
+        slides_tex = render_beamer_frames(topic.strip(), frames, language=lang)
+    else:
+        timeout_s = max(12.0, min(float(request_budget), 40.0))
+        client_scope = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["scope"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        client_structure = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["structure"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        client_planner = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["planner"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        client_section = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["section"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        client_paper = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["paper"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        client_slides = GrokClient(
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model_map["slides"],
+            timeout_s=timeout_s,
+            max_retries=2,
+            heartbeat=False,
+        )
+        quick_scope = {
+            "project_links": [
+                {
+                    "title": f"{topic.strip()} official repository",
+                    "url": "https://github.com",
+                    "reason": "Seed placeholder before remote scope enrichment.",
+                }
+            ],
+            "scope": {
+                "in_scope": ["architecture", "method", "evidence"],
+                "out_scope": ["marketing narrative", "non-technical roadmap"],
+                "key_questions": [
+                    "What core problem is solved?",
+                    "What design decisions matter most?",
+                    "What evidence is verifiable?",
+                ],
+            },
+        }
+        emit_stage(
+            2,
+            "Agent-1 ScopeScout",
+            "quick skeleton ready; enriching with remote call",
+            scope_text=json.dumps(quick_scope, ensure_ascii=False, indent=2),
+        )
+        mark(2, "Agent-1 ScopeScout", "asking Grok for project links + scope")
+        try:
+            scope_payload = _chat_json_resilient(
+                client_scope,
+                [
+                    ChatMessage(
+                        role="system",
+                        content=(
+                            "You are ScopeScout. Find key project links and define an initial technical research scope."
+                        ),
+                    ),
+                    ChatMessage(
+                        role="user",
+                        content=(
+                            "Topic: "
+                            + topic.strip()
+                            + "\nReturn JSON with keys: project_links (list of {title,url,reason}),"
+                            + " scope ({in_scope:[...], out_scope:[...], key_questions:[...]})"
+                        ),
+                    ),
+                ],
+                schema_hint=(
+                    '{"project_links":[{"title":"...","url":"https://...","reason":"..."}],'
+                    '"scope":{"in_scope":["..."],"out_scope":["..."],"key_questions":["..."]}}'
+                ),
+                temperature=0.1,
+                timeout_s=min(timeout_s, 18.0),
+            )
+        except Exception:
+            scope_payload = quick_scope
+        emit_stage(
+            2,
+            "Agent-1 ScopeScout",
+            "scope resolved",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+        )
+        mark(3, "Agent-StructureDesigner", "designing report architecture and slide style")
+        structure_obj = _chat_json_resilient(
+            client_structure,
+            [
+                ChatMessage(
+                    role="system",
+                    content=(
+                        "You are StructureDesigner. Build a publication-grade report architecture and a presentation"
+                        " style guide before drafting any sections."
+                        + (" Respond in Chinese." if lang == "zh" else " Respond in English.")
+                    ),
+                ),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Topic: "
+                        + topic.strip()
+                        + "\nScope JSON: "
+                        + json.dumps(scope_payload, ensure_ascii=False)
+                        + "\nReturn JSON {report_blueprint:{section_order:[...],section_goals:[...]},"
+                        + " slide_style:{theme,max_bullets,max_words_per_bullet,visual_rules:[...]}}"
+                        + " Ensure this is a RESEARCH REPORT structure (not academic paper IMRaD rigidity)."
+                    ),
+                ),
+            ],
+            schema_hint='{"report_blueprint":{"section_order":["..."],"section_goals":["..."]},"slide_style":{"theme":"..."}}',
+            temperature=0.15,
+            timeout_s=timeout_s,
+        )
+        if not isinstance(structure_obj, dict) or not structure_obj:
+            structure_obj = {
+                "report_blueprint": {
+                    "section_order": [
+                        "Abstract",
+                        "Introduction",
+                        "Methodology",
+                        "Results",
+                        "Discussion",
+                        "Conclusion",
+                    ],
+                    "section_goals": [
+                        "Summarize research contribution",
+                        "Define context and question",
+                        "Describe method rigorously",
+                        "Present evidence with citations",
+                        "Discuss limits and implications",
+                        "Conclude and future work",
+                    ],
+                },
+                "slide_style": {
+                    "theme": "metropolis-like clean",
+                    "max_bullets": 5,
+                    "max_words_per_bullet": 14,
+                    "visual_rules": [
+                        "one idea per slide",
+                        "results in table/figure frame",
+                        "consistent color accents",
+                    ],
+                },
+            }
+        emit_stage(
+            3,
+            "Agent-StructureDesigner",
+            "structure blueprint ready",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps(structure_obj, ensure_ascii=False, indent=2),
+        )
+        mark(4, "Agent-2 TemplatePlanner", "mapping scope to paper/beamer section summaries")
+        section_obj = _chat_json_resilient(
+            client_planner,
+            [
+                ChatMessage(
+                    role="system",
+                    content=(
+                        "You are TemplatePlanner. Based on scope and LaTeX paper/beamer structure, define section"
+                        " summaries that downstream section agents will write."
+                        + (" Respond in Chinese." if lang == "zh" else " Respond in English.")
+                    ),
+                ),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Topic: "
+                        + topic.strip()
+                        + "\nScope JSON: "
+                        + json.dumps(scope_payload, ensure_ascii=False)
+                        + "\nStructure JSON: "
+                        + json.dumps(structure_obj, ensure_ascii=False)
+                        + "\nReturn JSON: {sections:[{name,summary}]} with 6-8 sections for a RESEARCH REPORT."
+                        + " Ensure section names are concise and audience-friendly."
+                    ),
+                ),
+            ],
+            schema_hint='{"sections":[{"name":"Introduction","summary":"..."}]}',
+            temperature=0.1,
+            timeout_s=timeout_s,
+        )
+        raw_sections = section_obj.get("sections")
+        section_plan = [
+            {"name": str(x.get("name", "Section")), "summary": str(x.get("summary", ""))}
+            for x in raw_sections
+            if isinstance(x, dict)
+        ] if isinstance(raw_sections, list) else []
+        section_plan = section_plan[:6]
+        if not section_plan:
+            section_plan = [
+                {"name": "Abstract", "summary": "Concise summary of contribution and findings."},
+                {"name": "Introduction", "summary": "Problem framing and objectives."},
+                {"name": "Methodology", "summary": "Core architecture and methodology."},
+                {"name": "Results", "summary": "Findings grounded in verifiable sources."},
+            ]
+        emit_stage(
+            4,
+            "Agent-2 TemplatePlanner",
+            "section plan prepared",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+        )
+        mark(5, "Section Agents", "researching each section and drafting TeX fragments")
+        for idx, sec in enumerate(section_plan, start=1):
+            section_title = sec["name"]
+            latex_body = ""
+            for attempt in range(1, 4):
+                sec_obj = _chat_json_resilient(
+                    client_section,
+                    [
+                        ChatMessage(
+                            role="system",
+                            content=(
+                                "You are a SectionResearchAgent. Write a rigorous LaTeX fragment for your assigned"
+                                " section only."
+                                + (" Output Chinese text." if lang == "zh" else " Output English text.")
+                            ),
+                        ),
+                        ChatMessage(
+                            role="user",
+                            content=(
+                                f"Topic: {topic.strip()}\nSection: {sec['name']}\nSummary: {sec['summary']}\n"
+                                f"Structure JSON: {json.dumps(structure_obj, ensure_ascii=False)}\n"
+                                "Return JSON {section_title, latex_body}. latex_body must be plain LaTeX paragraphs"
+                                " without documentclass/begin{document}, with evidence-driven style and citation markers."
+                                " Keep each paragraph focused and concise for report readability."
+                                " Minimum: 2 substantive paragraphs. No placeholder text."
+                            ),
+                        ),
+                    ],
+                    schema_hint='{"section_title":"...","latex_body":"\\subsection*{...} ..."}',
+                    temperature=0.1,
+                    timeout_s=timeout_s,
+                )
+                cand_title = sec_obj.get("section_title")
+                cand_body = sec_obj.get("latex_body")
+                if isinstance(cand_title, str) and cand_title.strip():
+                    section_title = cand_title.strip()
+                if isinstance(cand_body, str):
+                    latex_body = cand_body.strip()
+                if _section_quality_ok(section_title, latex_body, lang):
+                    break
+                emit_stage(
+                    5,
+                    "Section Agents",
+                    f"quality gate retry {attempt}/3 for section {idx}",
+                    scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+                    section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+                    paper_text="\n\n".join(block["latex"] for block in section_blocks),
+                )
+            if not _section_quality_ok(section_title, latex_body, lang):
+                raise RuntimeError(
+                    f"Section agent failed quality gate after retries: {section_title}"
+                )
+            section_blocks.append({"name": section_title, "latex": latex_body})
+            mark(4, "Section Agents", f"completed {idx}/{len(section_plan)} sections")
+            emit_stage(
+                5,
+                "Section Agents",
+                f"completed {idx}/{len(section_plan)} sections",
+                scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+                section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+                paper_text="\n\n".join(block["latex"] for block in section_blocks),
+            )
+        mark(6, "Integrator-Paper", "assembling full paper.tex")
+        paper_obj = _chat_json_resilient(
+            client_paper,
+            [
+                ChatMessage(
+                    role="system",
+                    content=(
+                        "You are ReportIntegrator. Produce a professional LaTeX RESEARCH REPORT"
+                        " with executive readability, clear argument flow, and section coherence."
+                        + (" Output Chinese text." if lang == "zh" else " Output English text.")
+                    ),
+                ),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Topic: "
+                        + topic.strip()
+                        + "\nScope: "
+                        + json.dumps(scope_payload, ensure_ascii=False)
+                        + "\nStructure: "
+                        + json.dumps(structure_obj, ensure_ascii=False)
+                        + "\nSection snippets: "
+                        + json.dumps(section_blocks, ensure_ascii=False)
+                        + "\nReturn JSON {paper_tex} with a full compilable document using report sections:"
+                        + " Executive Summary/Abstract, Background, Approach, Results, Discussion, Risks, Conclusion, References."
+                        + " Each section should include concrete evidence statements and implementation-level details,"
+                        + " not high-level filler. Minimum 2-4 substantive paragraphs per major section."
+                    ),
+                ),
+            ],
+            schema_hint='{"paper_tex":"\\documentclass{article} ... \\end{document}"}',
+            temperature=0.1,
+            timeout_s=timeout_s,
+        )
+        _paper_candidate = paper_obj.get("paper_tex")
+        paper_tex = render_report_structured(topic.strip(), section_blocks, language=lang)
+        _assert_not_template_output("paper", paper_tex)
+        emit_stage(
+            6,
+            "Integrator-Paper",
+            "paper.tex assembled",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+            paper_text=paper_tex,
+        )
+        mark(7, "Integrator-Beamer", "assembling full slides.tex")
+        slides_obj = _chat_json_resilient(
+            client_slides,
+            [
+                ChatMessage(
+                    role="system",
+                    content=(
+                        "You are BeamerIntegrator. Produce a visually polished, conference-style Beamer deck"
+                        " with concise bullets, visual hierarchy, and readable spacing."
+                        + (" Output Chinese text." if lang == "zh" else " Output English text.")
+                    ),
+                ),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Topic: "
+                        + topic.strip()
+                        + "\nScope: "
+                        + json.dumps(scope_payload, ensure_ascii=False)
+                        + "\nSection plan: "
+                        + json.dumps(section_plan, ensure_ascii=False)
+                        + "\nSlide style: "
+                        + json.dumps(structure_obj.get("slide_style", {}), ensure_ascii=False)
+                        + "\nReturn JSON {slides_tex} with a full compilable beamer document."
+                        + " Use modern readable typography, max 5 bullets/frame, max 14 words/bullet,"
+                        + " and ensure each frame content fully fits without overflow."
+                        + " Include complete coverage: agenda, background, method, results, discussion, conclusion."
+                        + " Return STRICTLY compilable LaTeX without custom undefined macros."
+                    ),
+                ),
+            ],
+            schema_hint='{"slides_tex":"\\documentclass{beamer} ... \\end{document}"}',
+            temperature=0.1,
+            timeout_s=timeout_s,
+        )
+        _slides_candidate = slides_obj.get("slides_tex")
+        frames = build_slide_frames_from_sections(section_blocks, language=lang)
+        frames = enforce_slide_density(frames, language=lang)
+        slides_tex = render_beamer_frames(topic.strip(), frames, language=lang)
+        _assert_not_template_output("slides", slides_tex)
+        emit_stage(
+            7,
+            "Integrator-Beamer",
+            "slides.tex assembled",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+            paper_text=paper_tex,
+            slides_text=slides_tex,
+        )
+    mark(8, "Online Render", "compiling paper/slides to PDF via latexonline.cc")
+    emit_stage(
+        8,
+        "Online Render",
+        "rendering started",
+        scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+        section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+        paper_text=paper_tex,
+        slides_text=slides_tex,
+    )
+    try:
+        paper_pdf = _compile_latex_online(paper_tex, "hydradeck_agentic_paper.pdf")
+        slides_pdf = _compile_latex_online(slides_tex, "hydradeck_agentic_slides.pdf")
+        emit_stage(
+            8,
+            "Online Render",
+            "pdf rendered",
+            scope_text=json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            section_text=json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+            paper_text=paper_tex,
+            slides_text=slides_tex,
+            pdf_paths_text=paper_pdf + "\n" + slides_pdf,
+            paper_pdf_text=paper_pdf,
+            slides_pdf_text=slides_pdf,
+        )
+    except Exception as exc:
+        return (
+            f"Agentic run partial success: TeX generated but online PDF render failed: {exc}",
+            "\n".join(stage_logs),
+            json.dumps(scope_payload, ensure_ascii=False, indent=2),
+            json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+            paper_tex,
+            slides_tex,
+            "",
+            "",
+            "",
+        )
+    mark(9, "Done", "paper/slides PDFs rendered and ready")
+    return (
+        "Agentic pipeline done: scoped, drafted, integrated, rendered to PDF.",
+        "\n".join(stage_logs),
+        json.dumps(scope_payload, ensure_ascii=False, indent=2),
+        json.dumps({"sections": section_plan}, ensure_ascii=False, indent=2),
+        paper_tex,
+        slides_tex,
+        paper_pdf + "\n" + slides_pdf,
+        paper_pdf,
+        slides_pdf,
+    )
+def _run_agentic_pipeline_stream(
+    topic: str,
+    model: str,
+    base_url: str,
+    api_key: str,
+    request_budget: float,
+    use_mock: bool,
+):
+    status = "Agentic pipeline running..."
+    progress_log = "1/3 Starting workflow"
+    empty_json = ""
+    empty_tex = ""
+    empty_paths = ""
+    yield (
+        status,
+        progress_log,
+        empty_json,
+        empty_json,
+        empty_tex,
+        empty_tex,
+        empty_paths,
+        "",
+        "",
+        5,
+    )
+    progress_log = "1/3 API scope and section planning"
+    yield (
+        status,
+        progress_log,
+        empty_json,
+        empty_json,
+        empty_tex,
+        empty_tex,
+        empty_paths,
+        "",
+        "",
+        30,
+    )
+    events: Queue[dict[str, object]] = Queue()
+    def on_stage(payload: dict[str, object]) -> None:
+        events.put(payload)
+    with ThreadPoolExecutor(max_workers=1) as pool:
+        fut = pool.submit(
+            _run_agentic_pipeline,
+            topic,
+            model,
+            base_url,
+            api_key,
+            request_budget,
+            use_mock,
+            gr.Progress(),
+            on_stage,
+        )
+        wait_tick = 0
+        while not fut.done() or not events.empty():
+            try:
+                ev = events.get(timeout=1.0)
+                yield (
+                    str(ev.get("status", "Agentic pipeline running...")),
+                    str(ev.get("progress_log", "")),
+                    str(ev.get("scope", "")),
+                    str(ev.get("sections", "")),
+                    str(ev.get("paper", "")),
+                    str(ev.get("slides", "")),
+                    str(ev.get("pdf_paths", "")),
+                    str(ev.get("paper_pdf", "")),
+                    str(ev.get("slides_pdf", "")),
+                    int(str(ev.get("progress", "0"))),
+                )
+                continue
+            except Empty:
+                pass
+            wait_tick += 1
+            elapsed_s = wait_tick
+            heartbeat_pct = min(95, 30 + wait_tick)
+            yield (
+                "Agentic pipeline running...",
+                f"2/3 Running agent workflow ({elapsed_s}s elapsed)",
+                empty_json,
+                empty_json,
+                empty_tex,
+                empty_tex,
+                empty_paths,
+                "",
+                "",
+                heartbeat_pct,
+            )
+            time.sleep(1)
+        (
+            status2,
+            progress2,
+            scope2,
+            sections2,
+            paper2,
+            slides2,
+            paths2,
+            paper_pdf2,
+            slides_pdf2,
+        ) = fut.result()
+    done_log = "3/3 Completed"
+    if progress2.strip():
+        done_log = progress2 + "\n" + done_log
+    yield (
+        status2,
+        done_log,
+        scope2,
+        sections2,
+        paper2,
+        slides2,
+        paths2,
+        paper_pdf2,
+        slides_pdf2,
+        100,
+    )
+def _run_pipeline(
+    topic: str,
+    model: str,
+    base_url: str,
+    api_key: str,
+    max_sources: int,
+    iterations: int,
+    llm_timeout: float,
+    request_budget: float,
+    seed_urls_text: str,
+    use_mock: bool,
+) -> tuple[str, str, str, str]:
+    if not topic.strip():
+        return "Topic is required.", "", "", ""
+    selected_base_url = base_url.strip() or resolve_base_url("https://api.example.com")
+    selected_api_key = api_key.strip() or resolve_api_key()
+    if not use_mock:
+        preflight_error = _preflight_check(selected_base_url, selected_api_key, request_budget)
+        if preflight_error is not None:
+            return f"Preflight failed: {preflight_error}", "", "", ""
+    with tempfile.TemporaryDirectory() as td:
+        out_zip = Path(td) / "hydradeck_out.zip"
+        seeds = [x.strip() for x in seed_urls_text.splitlines() if x.strip()]
+        cfg = RunConfig(
+            topic=topic.strip(),
+            out=out_zip,
+            base_url=selected_base_url,
+            api_key=selected_api_key,
+            model=model.strip() or resolve_model("grok-4"),
+            iterations=max(1, int(iterations)),
+            max_sources=max(1, int(max_sources)),
+            llm_timeout_s=float(llm_timeout),
+            request_budget_s=float(request_budget),
+            use_mock=bool(use_mock),
+            seed_urls=seeds or None,
+            progress=False,
+            quality_gate=False,
+            archive_snapshots=False,
+        )
+        retry_cfg = RunConfig(
+            topic=cfg.topic,
+            out=cfg.out,
+            base_url=cfg.base_url,
+            api_key=cfg.api_key,
+            model=cfg.model,
+            iterations=cfg.iterations,
+            max_sources=cfg.max_sources,
+            module_sources=cfg.module_sources,
+            min_total_words=cfg.min_total_words,
+            use_mock=cfg.use_mock,
+            verbose=cfg.verbose,
+            llm_timeout_s=max(cfg.llm_timeout_s, 90.0),
+            facts_max_pages=cfg.facts_max_pages,
+            facts_max_chars_per_page=cfg.facts_max_chars_per_page,
+            facts_target=cfg.facts_target,
+            judge_max_chars=cfg.judge_max_chars,
+            pre_tex_quality_gate=cfg.pre_tex_quality_gate,
+            pre_tex_min_score=cfg.pre_tex_min_score,
+            pre_tex_attempts=cfg.pre_tex_attempts,
+            keep_stage=cfg.keep_stage,
+            verbatim=cfg.verbatim,
+            archive_prompts=cfg.archive_prompts,
+            archive_snapshots=cfg.archive_snapshots,
+            snapshot_timeout_s=cfg.snapshot_timeout_s,
+            snapshot_total_timeout_s=cfg.snapshot_total_timeout_s,
+            auto=cfg.auto,
+            auto_queries=cfg.auto_queries,
+            auto_models=cfg.auto_models,
+            quality_gate=cfg.quality_gate,
+            min_quality_score=cfg.min_quality_score,
+            max_quality_attempts=cfg.max_quality_attempts,
+            query_count=cfg.query_count,
+            max_query_modules=cfg.max_query_modules,
+            sources_attempts=cfg.sources_attempts,
+            max_total_runtime_s=max(cfg.max_total_runtime_s, 420.0),
+            progress=cfg.progress,
+            request_budget_s=max(cfg.request_budget_s, 35.0),
+            pdf_compiler=cfg.pdf_compiler,
+            template=cfg.template,
+            seed_urls=cfg.seed_urls,
+        )
+        try:
+            _ = run(cfg)
+        except Exception as exc:
+            err_text = str(exc)
+            retryable = ("Read timed out" in err_text) or ("timed out" in err_text.lower())
+            if (not use_mock) and retryable:
+                try:
+                    _ = run(retry_cfg)
+                except Exception as retry_exc:
+                    return (
+                        "Run failed after retry: "
+                        f"{retry_exc}. Try request_budget >= 35 and llm_timeout >= 90.",
+                        "",
+                        "",
+                        "",
+                    )
+            else:
+                return (
+                    "Run failed: "
+                    f"{exc}. If queue waits too long, try Use mock (offline) or increase Request budget.",
+                    "",
+                    "",
+                    "",
+                )
+        with zipfile.ZipFile(out_zip, "r") as z:
+            report_md = z.read("report.md").decode("utf-8", errors="replace")
+            paper_tex = z.read("paper.tex").decode("utf-8", errors="replace")
+            slides_tex = z.read("slides.tex").decode("utf-8", errors="replace")
+        copy_zip = Path("/tmp") / "hydradeck_space_output.zip"
+        copy_zip.write_bytes(out_zip.read_bytes())
+        status = f"Done. Output zip: {copy_zip}"
+        return status, report_md, paper_tex, slides_tex
+with gr.Blocks(title="hydradeck WebUI") as demo:
+    gr.Markdown("# hydradeck WebUI\nRun deep-research and export paper/slides tex.")
+    with gr.Row():
+        topic = gr.Textbox(label="Topic", value="RynnBrain technical report")
+        model = gr.Textbox(label="Model", value="grok-4")
+    with gr.Row():
+        base_url = gr.Textbox(label="Base URL", value="https://api.example.com")
+        api_key = gr.Textbox(label="API Key", type="password", value="")
+    with gr.Row():
+        max_sources = gr.Number(label="Max sources", value=6, precision=0)
+        iterations = gr.Number(label="Iterations", value=1, precision=0)
+        llm_timeout = gr.Number(label="LLM timeout (s)", value=90)
+        request_budget = gr.Number(label="Request budget (s)", value=35)
+    seed_urls = gr.Textbox(
+        label="Seed URLs (one per line)",
+        value="https://github.com/alibaba-damo-academy/RynnBrain\nhttps://arxiv.org",
+        lines=4,
+    )
+    use_mock = gr.Checkbox(label="Use mock (offline)", value=False)
+    check_btn = gr.Button("Quick API Check")
+    run_btn = gr.Button("Run Full Pipeline")
+    run_agentic_btn = gr.Button("Run Agentic Pipeline")
+    status = gr.Textbox(label="Status")
+    progress_pct = gr.Slider(label="Progress (%)", minimum=0, maximum=100, step=1, value=0, interactive=False)
+    progress_log = gr.Textbox(label="Agent Progress", lines=10)
+    scope_json = gr.Textbox(label="Scope (Agent-1)", lines=10)
+    section_plan_json = gr.Textbox(label="Section Plan (Agent-2)", lines=10)
+    report_md = gr.Textbox(label="report.md", lines=14)
+    paper_tex = gr.Textbox(label="paper.tex", lines=14)
+    slides_tex = gr.Textbox(label="slides.tex", lines=14)
+    rendered_pdfs = gr.Textbox(label="Rendered PDF Paths", lines=2)
+    paper_pdf_file = gr.Textbox(label="paper.pdf path", lines=1)
+    slides_pdf_file = gr.Textbox(label="slides.pdf path", lines=1)
+    check_btn.click(
+        _api_quick_check,
+        [base_url, api_key, model, request_budget],
+        [status],
+        queue=False,
+    )
+    run_btn.click(
+        _run_pipeline,
+        [
+            topic,
+            model,
+            base_url,
+            api_key,
+            max_sources,
+            iterations,
+            llm_timeout,
+            request_budget,
+            seed_urls,
+            use_mock,
+        ],
+        [status, report_md, paper_tex, slides_tex],
+        queue=False,
+    )
+    run_agentic_btn.click(
+        _run_agentic_pipeline_stream,
+        [topic, model, base_url, api_key, request_budget, use_mock],
+        [
+            status,
+            progress_log,
+            scope_json,
+            section_plan_json,
+            paper_tex,
+            slides_tex,
+            rendered_pdfs,
+            paper_pdf_file,
+            slides_pdf_file,
+            progress_pct,
+        ],
+        queue=True,
+    )
+if __name__ == "__main__":
+    demo.queue(default_concurrency_limit=2)
+    demo.launch(server_name="0.0.0.0", server_port=7860)

custom_web.py ADDED Viewed

	@@ -0,0 +1,547 @@

+from __future__ import annotations
+import json
+import threading
+import time
+import uuid
+from pathlib import Path
+from typing import Any
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import FileResponse, HTMLResponse
+import gradio as gr
+from pydantic import BaseModel
+from app import _api_quick_check, _run_agentic_pipeline
+from hydradeck.clients.grok_client import GrokClient
+class RunRequest(BaseModel):
+    topic: str
+    model: str = "grok-3-mini"
+    base_url: str = "https://api.example.com"
+    api_key: str = ""
+    request_budget: float = 30.0
+    use_mock: bool = False
+    language: str = "en"
+    model_scope: str = ""
+    model_structure: str = ""
+    model_planner: str = ""
+    model_section: str = ""
+    model_paper: str = ""
+    model_slides: str = ""
+JOBS: dict[str, dict[str, Any]] = {}
+LOCK = threading.Lock()
+STATE_PATH = Path("/tmp/hydradeck_state.json")
+HISTORY_LIMIT = 40
+app = FastAPI(title="HydraDeck")
+def _load_state() -> None:
+    if not STATE_PATH.exists():
+        return
+    try:
+        data = json.loads(STATE_PATH.read_text(encoding="utf-8"))
+    except Exception:
+        return
+    jobs = data.get("jobs")
+    if isinstance(jobs, dict):
+        with LOCK:
+            JOBS.update({str(k): v for k, v in jobs.items() if isinstance(v, dict)})
+def _save_state() -> None:
+    with LOCK:
+        payload = {"jobs": JOBS}
+    STATE_PATH.write_text(json.dumps(payload, ensure_ascii=False), encoding="utf-8")
+def _prune_history() -> None:
+    with LOCK:
+        items = sorted(
+            JOBS.items(),
+            key=lambda kv: float(kv[1].get("updated_at", 0.0)),
+            reverse=True,
+        )
+        keep = dict(items[:HISTORY_LIMIT])
+        JOBS.clear()
+        JOBS.update(keep)
+_load_state()
+def _new_job(req: RunRequest) -> dict[str, Any]:
+    now = time.time()
+    return {
+        "id": str(uuid.uuid4()),
+        "status": "queued",
+        "created_at": now,
+        "updated_at": now,
+        "progress": 0,
+        "status_text": "Queued",
+        "progress_log": "",
+        "scope": "",
+        "sections": "",
+        "paper": "",
+        "slides": "",
+        "pdf_paths": "",
+        "paper_pdf": "",
+        "slides_pdf": "",
+        "error": "",
+        "events": [],
+        "params": req.model_dump(),
+    }
+def _update_job(job_id: str, updates: dict[str, Any]) -> None:
+    with LOCK:
+        job = JOBS.get(job_id)
+        if not job:
+            return
+        job.update(updates)
+        job["updated_at"] = time.time()
+    _prune_history()
+    _save_state()
+def _append_event(job_id: str, event: dict[str, Any]) -> None:
+    with LOCK:
+        job = JOBS.get(job_id)
+        if not job:
+            return
+        events = job.get("events")
+        if isinstance(events, list):
+            events.append(event)
+    _save_state()
+def _run_job(job_id: str, req: RunRequest) -> None:
+    _update_job(job_id, {"status": "running", "status_text": "Running"})
+    def on_stage(payload: dict[str, Any]) -> None:
+        _update_job(
+            job_id,
+            {
+                "status": "running",
+                "status_text": str(payload.get("status", "Running")),
+                "progress": int(str(payload.get("progress", "0"))),
+                "progress_log": str(payload.get("progress_log", "")),
+                "scope": str(payload.get("scope", "")),
+                "sections": str(payload.get("sections", "")),
+                "paper": str(payload.get("paper", "")),
+                "slides": str(payload.get("slides", "")),
+                "pdf_paths": str(payload.get("pdf_paths", "")),
+                "paper_pdf": str(payload.get("paper_pdf", "")),
+                "slides_pdf": str(payload.get("slides_pdf", "")),
+            },
+        )
+        _append_event(
+            job_id,
+            {
+                "ts": time.time(),
+                "stage": str(payload.get("stage", "")),
+                "detail": str(payload.get("detail", "")),
+                "progress": int(str(payload.get("progress", "0"))),
+            },
+        )
+    try:
+        (
+            status,
+            progress_log,
+            scope,
+            sections,
+            paper,
+            slides,
+            pdf_paths,
+            paper_pdf,
+            slides_pdf,
+        ) = _run_agentic_pipeline(
+            topic=req.topic,
+            model=req.model,
+            base_url=req.base_url,
+            api_key=req.api_key,
+            request_budget=req.request_budget,
+            use_mock=req.use_mock,
+            progress=gr.Progress(),
+            stage_callback=on_stage,
+            language=req.language,
+            stage_models={
+                "scope": req.model_scope,
+                "structure": req.model_structure,
+                "planner": req.model_planner,
+                "section": req.model_section,
+                "paper": req.model_paper,
+                "slides": req.model_slides,
+            },
+        )
+        _update_job(
+            job_id,
+            {
+                "status": "done",
+                "status_text": status,
+                "progress": 100,
+                "progress_log": progress_log,
+                "scope": scope,
+                "sections": sections,
+                "paper": paper,
+                "slides": slides,
+                "pdf_paths": pdf_paths,
+                "paper_pdf": paper_pdf,
+                "slides_pdf": slides_pdf,
+            },
+        )
+    except Exception as exc:
+        _update_job(
+            job_id,
+            {
+                "status": "error",
+                "status_text": "Failed",
+                "error": str(exc),
+            },
+        )
+@app.get("/", response_class=HTMLResponse)
+def index() -> str:
+    return """
+<!doctype html>
+<html>
+<head>
+  <meta charset=\"utf-8\" />
+  <title>HydraDeck</title>
+  <style>
+    :root{--bg:#f5ecd8;--paper:#fff9ec;--ink:#2a1f12;--muted:#7a5f3e;--accent:#8b3a3a;--ok:#2f6f3e}
+    body{font-family:"IBM Plex Mono","Courier New",monospace;max-width:1220px;margin:18px auto;padding:0 12px;background:var(--bg);color:var(--ink)}
+    .panel{border:2px solid var(--ink);background:var(--paper);box-shadow:2px 2px 0 #0002;padding:10px;margin:10px 0}
+    .row{display:flex;gap:10px;margin:8px 0;flex-wrap:wrap}
+    input,select,textarea{padding:8px;width:100%;border:1px solid #4b3924;background:#fffdf7;color:var(--ink)}
+    button{padding:9px 13px;border:2px solid var(--ink);background:#ead2b0;color:var(--ink);cursor:pointer}
+    button:hover{background:#f0ddc3}
+    .bar{height:16px;background:#d8c3a5;border:1px solid #4b3924;overflow:hidden}
+    .fill{height:100%;width:0%;background:linear-gradient(90deg,#8b3a3a,#d46a6a);transition:width .25s}
+    .grid{display:grid;grid-template-columns:1fr 1fr;gap:12px}
+    pre{background:#1b130c;color:#f7e8d0;padding:10px;white-space:pre-wrap;max-height:260px;overflow:auto;border:1px solid #3a2a1b}
+    .title{font-size:28px;font-weight:700;letter-spacing:1px}
+    .sub{color:var(--muted)}
+    .tiny{font-size:12px;color:var(--muted)}
+    details{border:1px dashed #7a5f3e;padding:8px;background:#fff9ef}
+    summary{cursor:pointer;font-weight:700}
+  </style>
+</head>
+<body>
+  <div class=\"panel\"><div class=\"title\">HydraDeck</div></div>
+  <div class=\"panel\">
+    <div class=\"row\" style=\"gap:6px\">
+      <button onclick=\"showTab('tab-run')\">Run</button>
+      <button onclick=\"showTab('tab-artifacts')\">Artifacts</button>
+      <button onclick=\"showTab('tab-console')\">Console</button>
+    </div>
+  </div>
+  <div id=\"tab-run\" class=\"panel tab\">
+    <div class=\"row\"><input id=\"topic\" value=\"RynnBrain technical research report\" /></div>
+    <div class=\"row\">
+      <select id=\"model\"></select>
+      <input id=\"base_url\" value=\"https://api.example.com\" />
+    </div>
+    <div class=\"row\">
+      <label>language
+        <select id=\"language\">
+          <option value=\"en\" selected>English</option>
+          <option value=\"zh\">中文</option>
+        </select>
+      </label>
+      <input id=\"api_key\" placeholder=\"api key\" />
+      <input id=\"request_budget\" value=\"30\" />
+      <label><input id=\"use_mock\" type=\"checkbox\" /> use mock</label>
+    </div>
+    <div class=\"row\">
+      <button onclick=\"quickCheck()\">Quick API Check</button>
+      <button onclick=\"startRun()\">Run HydraDeck</button>
+      <button onclick=\"resumeLastRun()\">Resume Last Run</button>
+    </div>
+    <details>
+      <summary>Advanced model routing</summary>
+      <div class=\"tiny\">Per-agent model overrides (optional)</div>
+      <div class=\"row\"><select id=\"model_scope\"></select><select id=\"model_structure\"></select></div>
+      <div class=\"row\"><select id=\"model_planner\"></select><select id=\"model_section\"></select></div>
+      <div class=\"row\"><select id=\"model_paper\"></select><select id=\"model_slides\"></select></div>
+    </details>
+  </div>
+  <div id=\"status\">Idle</div>
+  <div class=\"bar\"><div id=\"fill\" class=\"fill\"></div></div>
+  <div id=\"pct\">0%</div>
+  <div id=\"tab-artifacts\" class=\"panel tab\" style=\"display:none\">
+    <div class=\"row\">
+    <a id=\"paperLink\" target=\"_blank\"></a>
+    <a id=\"slidesLink\" target=\"_blank\"></a>
+    </div>
+    <div class=\"grid\">
+    <div><h4>Scope</h4><pre id=\"scope\"></pre></div>
+    <div><h4>Sections</h4><pre id=\"sections\"></pre></div>
+    <div><h4>paper.tex</h4><pre id=\"paper\"></pre></div>
+    <div><h4>slides.tex</h4><pre id=\"slides\"></pre></div>
+    </div>
+  </div>
+  <div id=\"tab-console\" class=\"panel tab\" style=\"display:none\">
+    <div class=\"grid\">
+      <div><h4>Progress</h4><pre id=\"progress\"></pre></div>
+      <div><h4>Events</h4><pre id=\"events\"></pre></div>
+    </div>
+  </div>
+<script>
+let jobId = null;
+let timer = null;
+let inflight = false;
+let refreshFailCount = 0;
+function showTab(id){
+  for(const el of document.querySelectorAll('.tab')) el.style.display='none';
+  document.getElementById(id).style.display='block';
+}
+function addModelOptions(selectId, models){
+  const s=document.getElementById(selectId);
+  s.innerHTML='';
+  const blank=document.createElement('option');
+  blank.value='';
+  blank.textContent = selectId==='model' ? '(default model)' : '(inherit default)';
+  s.appendChild(blank);
+  for(const m of models){
+    const o=document.createElement('option');
+    o.value=m; o.textContent=m; s.appendChild(o);
+  }
+}
+async function loadModels(){
+  try{
+    const ctl = new AbortController();
+    const t = setTimeout(()=>ctl.abort(), 15000);
+    const r=await fetch('/api/models?base_url='+encodeURIComponent(document.getElementById('base_url').value)+'&api_key='+encodeURIComponent(document.getElementById('api_key').value), {signal: ctl.signal});
+    clearTimeout(t);
+    const j=await r.json();
+    const models=Array.isArray(j.models)?j.models:[];
+    for(const id of ['model','model_scope','model_structure','model_planner','model_section','model_paper','model_slides']) addModelOptions(id, models);
+    if(models.includes('grok-3-mini')) document.getElementById('model').value='grok-3-mini';
+  }catch(e){
+    document.getElementById('status').innerText='model list failed: '+e;
+  }
+}
+function payload(){
+  return {
+    topic: document.getElementById('topic').value,
+    model: document.getElementById('model').value,
+    base_url: document.getElementById('base_url').value,
+    api_key: document.getElementById('api_key').value,
+    request_budget: Number(document.getElementById('request_budget').value || 30),
+    use_mock: document.getElementById('use_mock').checked,
+    language: document.getElementById('language').value,
+    model_scope: document.getElementById('model_scope').value,
+    model_structure: document.getElementById('model_structure').value,
+    model_planner: document.getElementById('model_planner').value,
+    model_section: document.getElementById('model_section').value,
+    model_paper: document.getElementById('model_paper').value,
+    model_slides: document.getElementById('model_slides').value,
+  };
+}
+async function quickCheck(){
+  const ctl = new AbortController();
+  const t = setTimeout(()=>ctl.abort(), 20000);
+  const r = await fetch('/api/quick-check',{method:'POST',headers:{'content-type':'application/json'},body:JSON.stringify(payload()),signal: ctl.signal});
+  clearTimeout(t);
+  const j = await r.json();
+  document.getElementById('status').innerText = j.result || j.error;
+  showTab('tab-console');
+}
+async function startRun(){
+  if(inflight) return;
+  inflight = true;
+  const ctl = new AbortController();
+  const t = setTimeout(()=>ctl.abort(), 20000);
+  const r = await fetch('/api/jobs',{method:'POST',headers:{'content-type':'application/json'},body:JSON.stringify(payload()),signal: ctl.signal});
+  clearTimeout(t);
+  const j = await r.json();
+  jobId = j.id;
+  localStorage.setItem('hydradeck_last_job_id', jobId);
+  if (timer) clearInterval(timer);
+  timer = setInterval(refresh, 1000);
+  refresh();
+  showTab('tab-console');
+}
+async function refresh(){
+  if(!inflight) return;
+  if(!jobId) return;
+  try {
+    const ctl = new AbortController();
+    const t = setTimeout(()=>ctl.abort(), 12000);
+    const r = await fetch('/api/jobs/'+jobId, {signal: ctl.signal});
+    clearTimeout(t);
+    if(!r.ok) {
+      refreshFailCount += 1;
+      if (refreshFailCount >= 5) {
+        inflight = false;
+        if (timer) { clearInterval(timer); timer = null; }
+        document.getElementById('status').innerText = 'Polling paused (network/server issue). Use Resume Last Run.';
+      }
+      return;
+    }
+    const j = await r.json();
+    refreshFailCount = 0;
+  document.getElementById('status').innerText = j.status_text || j.status;
+  const p = Math.max(0, Math.min(100, Number(j.progress || 0)));
+  document.getElementById('fill').style.width = p + '%';
+  document.getElementById('pct').innerText = p + '%';
+  document.getElementById('progress').innerText = j.progress_log || '';
+  document.getElementById('scope').innerText = j.scope || '';
+  document.getElementById('sections').innerText = j.sections || '';
+  document.getElementById('paper').innerText = j.paper || '';
+  document.getElementById('slides').innerText = j.slides || '';
+  document.getElementById('events').innerText = JSON.stringify(j.events || [], null, 2);
+  const p1 = document.getElementById('paperLink');
+  const p2 = document.getElementById('slidesLink');
+  if (j.paper_pdf){ p1.href = '/api/jobs/'+jobId+'/artifact/paper'; p1.innerText='Download paper.pdf'; }
+  if (j.slides_pdf){ p2.href = '/api/jobs/'+jobId+'/artifact/slides'; p2.innerText='Download slides.pdf'; }
+  if (j.status === 'done' || j.status === 'error') {
+    clearInterval(timer);
+    timer = null;
+    inflight = false;
+    localStorage.removeItem('hydradeck_last_job_id');
+  }
+  } catch (e) {
+    refreshFailCount += 1;
+    if (refreshFailCount >= 5) {
+      inflight = false;
+      if (timer) { clearInterval(timer); timer = null; }
+      document.getElementById('status').innerText = 'Polling paused due to repeated timeout. Use Resume Last Run.';
+    }
+  }
+}
+function resumeLastRun(){
+  const saved = localStorage.getItem('hydradeck_last_job_id');
+  if(!saved){
+    document.getElementById('status').innerText = 'No resumable job.';
+    return;
+  }
+  jobId = saved;
+  inflight = true;
+  refreshFailCount = 0;
+  if (timer) clearInterval(timer);
+  timer = setInterval(refresh, 1000);
+  refresh();
+  showTab('tab-console');
+}
+document.getElementById('base_url').addEventListener('change', loadModels);
+document.getElementById('api_key').addEventListener('change', loadModels);
+loadModels();
+showTab('tab-run');
+if(localStorage.getItem('hydradeck_last_job_id')){
+  document.getElementById('status').innerText = 'Last run available. Click Resume Last Run to continue.';
+}
+</script>
+</body>
+</html>
+"""
+@app.post("/api/quick-check")
+def api_quick_check(req: RunRequest) -> dict[str, str]:
+    result = _api_quick_check(req.base_url, req.api_key, req.model, req.request_budget)
+    return {"result": result}
+@app.post("/api/jobs")
+def create_job(req: RunRequest) -> dict[str, str]:
+    if not req.topic.strip():
+        raise HTTPException(status_code=400, detail="topic is required")
+    job = _new_job(req)
+    with LOCK:
+        JOBS[job["id"]] = job
+    _prune_history()
+    _save_state()
+    t = threading.Thread(target=_run_job, args=(job["id"], req), daemon=True)
+    t.start()
+    return {"id": job["id"]}
+@app.get("/api/history")
+def get_history() -> dict[str, Any]:
+    with LOCK:
+        items = sorted(
+            JOBS.values(),
+            key=lambda j: float(j.get("updated_at", 0.0)),
+            reverse=True,
+        )
+        rows = [
+            {
+                "id": j.get("id"),
+                "status": j.get("status"),
+                "progress": j.get("progress"),
+                "topic": (j.get("params") or {}).get("topic", ""),
+                "updated_at": j.get("updated_at"),
+            }
+            for j in items[:HISTORY_LIMIT]
+        ]
+    return {"items": rows}
+@app.get("/api/models")
+def get_models(base_url: str, api_key: str = "") -> dict[str, Any]:
+    try:
+        cli = GrokClient(base_url=base_url, api_key=api_key, model="grok-3-mini", timeout_s=20.0, max_retries=1)
+        models = cli.list_models(timeout_s=20.0)
+        return {"models": models}
+    except Exception as exc:
+        return {"models": [], "error": str(exc)}
+@app.get("/api/jobs/{job_id}")
+def get_job(job_id: str) -> dict[str, Any]:
+    with LOCK:
+        job = JOBS.get(job_id)
+        if not job:
+            raise HTTPException(status_code=404, detail="job not found")
+        return dict(job)
+@app.get("/api/jobs/{job_id}/artifact/{kind}")
+def get_artifact(job_id: str, kind: str):
+    with LOCK:
+        job = JOBS.get(job_id)
+        if not job:
+            raise HTTPException(status_code=404, detail="job not found")
+        if kind == "paper":
+            path = str(job.get("paper_pdf", ""))
+            filename = "paper.pdf"
+        elif kind == "slides":
+            path = str(job.get("slides_pdf", ""))
+            filename = "slides.pdf"
+        else:
+            raise HTTPException(status_code=400, detail="kind must be paper|slides")
+    p = Path(path)
+    if not path or not p.exists():
+        raise HTTPException(status_code=404, detail="artifact not ready")
+    return FileResponse(str(p), media_type="application/pdf", filename=filename)
+if __name__ == "__main__":
+    import uvicorn
+    _load_state()
+    uvicorn.run(app, host="0.0.0.0", port=7861)

hydradeck/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ __all__ = ["__version__"]
2	+
3	+ __version__ = "0.1.0"

hydradeck/agents/personas.py ADDED Viewed

	@@ -0,0 +1,98 @@

+from __future__ import annotations
+from dataclasses import dataclass
+@dataclass(frozen=True)
+class Persona:
+    name: str
+    system_prompt: str
+PERSONAS: list[Persona] = [
+    Persona(
+        name="QueryPlanner",
+        system_prompt="\n".join(
+            [
+                "You are a query planner for deep research.",
+                "You produce diverse, high-recall search queries.",
+                "Prefer queries that locate primary sources and benchmarks.",
+                "Return concise query lists and what each query is for.",
+            ]
+        ),
+    ),
+    Persona(
+        name="Explorer",
+        system_prompt=(
+            "\n".join(
+                [
+                    "You are an exploratory researcher.",
+                    "Propose search directions, structure, and hypotheses.",
+                    "Be concrete: propose queries and evaluation criteria.",
+                    "State what evidence would change conclusions.",
+                ]
+            )
+        ),
+    ),
+    Persona(
+        name="Librarian",
+        system_prompt=(
+            "\n".join(
+                [
+                    "You are a source curator.",
+                    "Prefer primary sources: official docs, standards, peer-reviewed papers.",
+                    "Avoid SEO spam.",
+                    "For every claim, think about what citation would support it.",
+                ]
+            )
+        ),
+    ),
+    Persona(
+        name="Skeptic",
+        system_prompt=(
+            "\n".join(
+                [
+                    "You are a skeptical reviewer.",
+                    "Challenge unsupported claims and ask for stronger evidence.",
+                    "Surface counterexamples, limitations, and propose sanity checks.",
+                ]
+            )
+        ),
+    ),
+    Persona(
+        name="Synthesizer",
+        system_prompt=(
+            "\n".join(
+                [
+                    "You are a technical writer.",
+                    "Produce detailed, structured, citation-grounded research reports.",
+                    "Separate what is known vs uncertain.",
+                    "Include actionable takeaways.",
+                ]
+            )
+        ),
+    ),
+    Persona(
+        name="Presenter",
+        system_prompt=(
+            "\n".join(
+                [
+                    "You are a speaking coach and slide designer.",
+                    "Create a clear talk, strong narrative, and Beamer slides.",
+                    "Keep slides concise, but keep the script detailed.",
+                ]
+            )
+        ),
+    ),
+    Persona(
+        name="Judge",
+        system_prompt="\n".join(
+            [
+                "You are a strict third-party evaluator.",
+                "Score the provided artifacts against the rubric.",
+                "Be specific about missing sections, weak evidence, and citation issues.",
+                "Return JSON only.",
+            ]
+        ),
+    ),
+]

hydradeck/cli.py ADDED Viewed

	@@ -0,0 +1,522 @@

+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+from hydradeck.config import (
+    UserConfig,
+    resolve_api_key,
+    resolve_base_url,
+    resolve_model,
+    resolve_pdf_compiler,
+    resolve_template,
+    save_config,
+)
+from hydradeck.core.types import RunConfig
+from hydradeck.pipeline import run
+from hydradeck.resources_pack import build_resources_pack
+def _build_parser() -> argparse.ArgumentParser:
+    p = argparse.ArgumentParser(prog="hydradeck")
+    sub = p.add_subparsers(dest="cmd", required=True)
+    runp = sub.add_parser("run", help="Run Grok deep research pipeline")
+    runp.add_argument("--topic", required=True, help="Research topic")
+    runp.add_argument("--out", required=True, help="Output directory or .zip")
+    runp.add_argument("--iterations", type=int, default=3, help="Persona iteration rounds")
+    runp.add_argument("--max-sources", type=int, default=10, help="Max sources to include")
+    runp.add_argument(
+        "--min-words",
+        type=int,
+        default=12000,
+        help="Target minimum words (guidance to model; markdown is primary)",
+    )
+    runp.add_argument("--base-url", default=None, help="API base URL")
+    runp.add_argument("--model", default=None, help="Model name")
+    runp.add_argument(
+        "--keep-stage",
+        action="store_true",
+        help="If --out is a .zip, keep the staging directory on disk",
+    )
+    runp.add_argument(
+        "--seed-url",
+        action="append",
+        default=None,
+        help="Seed URL to include as source (can be repeated)",
+    )
+    runp.add_argument("--llm-timeout", type=float, default=180.0, help="LLM timeout seconds")
+    runp.add_argument("--mock", action="store_true", help="Use deterministic mock (no network)")
+    runp.add_argument("--verbose", action="store_true", help="Verbose logging")
+    runp.add_argument(
+        "--heartbeat",
+        action="store_true",
+        help="Emit periodic heartbeat during long network calls",
+    )
+    runp.add_argument(
+        "--progress",
+        action="store_true",
+        help="Show a progress bar for generation stages",
+    )
+    runp.add_argument(
+        "--request-budget",
+        type=float,
+        default=20.0,
+        help="Per-request timeout budget (seconds)",
+    )
+    runp.add_argument(
+        "--verbatim",
+        action="store_true",
+        help="Write model-produced artifacts verbatim (no rendering/rewriting)",
+    )
+    runp.add_argument(
+        "--no-archive-prompts",
+        action="store_true",
+        help="Do not archive prompts/requests in the output package",
+    )
+    runp.add_argument(
+        "--quality-gate",
+        action="store_true",
+        help="Require passing third-party score before writing outputs",
+    )
+    runp.add_argument(
+        "--min-quality",
+        type=float,
+        default=0.85,
+        help="Minimum quality score (0-1)",
+    )
+    runp.add_argument(
+        "--quality-attempts",
+        type=int,
+        default=3,
+        help="Max regeneration attempts to meet quality gate",
+    )
+    runp.add_argument(
+        "--archive-snapshots",
+        action="store_true",
+        help="Fetch and archive source page snapshots into resources/snapshots",
+    )
+    runp.add_argument(
+        "--snapshot-timeout",
+        type=float,
+        default=25.0,
+        help="Per-URL snapshot fetch timeout (seconds)",
+    )
+    runp.add_argument(
+        "--snapshot-total-timeout",
+        type=float,
+        default=60.0,
+        help="Total time budget for all snapshots (seconds)",
+    )
+    prep = sub.add_parser(
+        "pre",
+        help="Generate a preset pre-research package (no API key required)",
+    )
+    prep.add_argument("--preset", required=True, help="Preset name (e.g. rynnbrain)")
+    prep.add_argument("--out", required=True, help="Output directory or .zip")
+    prep.add_argument(
+        "--keep-stage",
+        action="store_true",
+        help="Keep staging directory when output is .zip",
+    )
+    prep.add_argument(
+        "--no-fetch",
+        action="store_true",
+        help="Do not fetch and archive web snapshots",
+    )
+    models_p = sub.add_parser("models", help="List available models")
+    models_p.add_argument(
+        "--base-url",
+        default=None,
+        help="API base URL",
+    )
+    auto_p = sub.add_parser(
+        "auto",
+        help="Run autonomous deep research (verbatim + prompts + snapshots)",
+    )
+    auto_p.add_argument("--topic", required=True, help="Research topic")
+    auto_p.add_argument("--out", required=True, help="Output directory or .zip")
+    auto_p.add_argument(
+        "--base-url",
+        default=None,
+        help="API base URL",
+    )
+    auto_p.add_argument(
+        "--model",
+        default=None,
+        help="Fallback model name",
+    )
+    auto_p.add_argument(
+        "--iterations",
+        type=int,
+        default=3,
+        help="Persona iteration rounds",
+    )
+    auto_p.add_argument(
+        "--max-sources",
+        type=int,
+        default=12,
+        help="Max sources to include",
+    )
+    auto_p.add_argument(
+        "--module-sources",
+        type=int,
+        default=5,
+        help="Sources per query module",
+    )
+    auto_p.add_argument(
+        "--query-count",
+        type=int,
+        default=8,
+        help="Number of queries to generate (high recall)",
+    )
+    auto_p.add_argument(
+        "--max-query-modules",
+        type=int,
+        default=2,
+        help="Max query modules to expand into sources",
+    )
+    auto_p.add_argument(
+        "--sources-attempts",
+        type=int,
+        default=3,
+        help="Max attempts to obtain sources (must be <=3)",
+    )
+    auto_p.add_argument(
+        "--facts-max-pages",
+        type=int,
+        default=6,
+        help="Max pages to pass into facts extraction",
+    )
+    auto_p.add_argument(
+        "--facts-max-chars",
+        type=int,
+        default=8000,
+        help="Max chars per page passed into facts extraction",
+    )
+    auto_p.add_argument(
+        "--facts-target",
+        type=int,
+        default=30,
+        help="Approximate number of facts to extract",
+    )
+    auto_p.add_argument(
+        "--judge-max-chars",
+        type=int,
+        default=12000,
+        help="Max chars per artifact passed into judge",
+    )
+    auto_p.add_argument(
+        "--max-runtime",
+        type=float,
+        default=240.0,
+        help="Max total runtime seconds before aborting",
+    )
+    auto_p.add_argument(
+        "--llm-timeout",
+        type=float,
+        default=180.0,
+        help="LLM timeout seconds",
+    )
+    auto_p.add_argument(
+        "--snapshot-timeout",
+        type=float,
+        default=25.0,
+        help="Per-URL snapshot fetch timeout (seconds)",
+    )
+    auto_p.add_argument("--mock", action="store_true", help="Use deterministic mock")
+    auto_p.add_argument("--verbose", action="store_true", help="Verbose logging")
+    auto_p.add_argument(
+        "--heartbeat",
+        action="store_true",
+        help="Emit periodic heartbeat during long network calls",
+    )
+    auto_p.add_argument(
+        "--progress",
+        action="store_true",
+        help="Show a progress bar for generation stages",
+    )
+    auto_p.add_argument(
+        "--request-budget",
+        type=float,
+        default=20.0,
+        help="Per-request timeout budget (seconds)",
+    )
+    auto_p.add_argument(
+        "--min-quality",
+        type=float,
+        default=0.85,
+        help="Minimum quality score (0-1)",
+    )
+    auto_p.add_argument(
+        "--quality-attempts",
+        type=int,
+        default=3,
+        help="Max regeneration attempts to meet quality gate",
+    )
+    cfg_p = sub.add_parser("config", help="Persist local config (base_url/model/api_key)")
+    cfg_p.add_argument("--base-url", default=None, help="API base URL")
+    cfg_p.add_argument("--model", default=None, help="Default model")
+    cfg_p.add_argument("--api-key", default=None, help="API key (stored locally)")
+    cfg_p.add_argument(
+        "--pdf-compiler",
+        default=None,
+        help="PDF compiler backend: latexonline or texlive",
+    )
+    cfg_p.add_argument(
+        "--template",
+        default=None,
+        help="Template: iclr2026 or plain",
+    )
+    res_p = sub.add_parser("resources", help="One-click resources pack (no seed required)")
+    res_p.add_argument("--topic", required=True, help="Research topic")
+    res_p.add_argument("--out", required=True, help="Output directory or .zip")
+    res_p.add_argument(
+        "--base-url",
+        default=None,
+        help="API base URL",
+    )
+    res_p.add_argument(
+        "--model",
+        default=None,
+        help="Model name",
+    )
+    res_p.add_argument(
+        "--pdf-compiler",
+        default=resolve_pdf_compiler("auto"),
+        help="PDF compiler: auto|latexonline|texlive",
+    )
+    res_p.add_argument(
+        "--template",
+        default=resolve_template("pretty"),
+        help="Template: pretty|plain",
+    )
+    res_p.add_argument("--max-sources", type=int, default=8, help="Max sources")
+    res_p.add_argument("--module-sources", type=int, default=3, help="Sources per module")
+    res_p.add_argument("--llm-timeout", type=float, default=35.0, help="LLM timeout")
+    res_p.add_argument("--snapshot-timeout", type=float, default=10.0, help="Snapshot timeout")
+    res_p.add_argument(
+        "--snapshot-total-timeout",
+        type=float,
+        default=60.0,
+        help="Total time budget for all snapshots",
+    )
+    res_p.add_argument("--max-runtime", type=float, default=180.0, help="Max runtime")
+    res_p.add_argument("--request-budget", type=float, default=15.0, help="Per-request budget")
+    res_p.add_argument("--keep-stage", action="store_true", help="Keep staging directory")
+    res_p.add_argument("--heartbeat", action="store_true", help="Heartbeat")
+    res_p.add_argument("--progress", action="store_true", help="Progress bar")
+    wiz_p = sub.add_parser("wizard", help="Guided research (interactive)")
+    wiz_p.add_argument("--out", required=False, default=None, help="Output directory or .zip")
+    return p
+def _prompt(prompt: str, default: str | None = None) -> str:
+    suffix = f" [{default}]" if default else ""
+    v = input(prompt + suffix + ": ").strip()
+    if not v and default is not None:
+        return default
+    return v
+def _prompt_int(prompt: str, default: int) -> int:
+    v = _prompt(prompt, str(default))
+    try:
+        return int(v)
+    except Exception:
+        return default
+def _prompt_float(prompt: str, default: float) -> float:
+    v = _prompt(prompt, str(default))
+    try:
+        return float(v)
+    except Exception:
+        return default
+def main(argv: list[str] | None = None) -> int:
+    args = _build_parser().parse_args(argv)
+    if args.cmd == "run":
+        base_url = resolve_base_url(args.base_url)
+        model = resolve_model(args.model)
+        cfg = RunConfig(
+            topic=args.topic,
+            out=Path(args.out),
+            base_url=base_url,
+            api_key=resolve_api_key(),
+            model=model,
+            iterations=max(int(args.iterations), 1),
+            max_sources=max(int(args.max_sources), 1),
+            min_total_words=max(int(args.min_words), 1000),
+            use_mock=bool(args.mock),
+            verbose=bool(args.verbose or args.heartbeat),
+            progress=bool(args.progress),
+            llm_timeout_s=float(args.llm_timeout),
+            request_budget_s=float(args.request_budget),
+            keep_stage=bool(args.keep_stage),
+            verbatim=bool(args.verbatim),
+            archive_prompts=not bool(args.no_archive_prompts),
+            archive_snapshots=bool(args.archive_snapshots),
+            snapshot_timeout_s=float(args.snapshot_timeout),
+            snapshot_total_timeout_s=float(args.snapshot_total_timeout),
+            quality_gate=bool(args.quality_gate),
+            min_quality_score=float(args.min_quality),
+            max_quality_attempts=int(args.quality_attempts),
+            seed_urls=args.seed_url,
+        )
+        run(cfg)
+        return 0
+    if args.cmd == "pre":
+        from hydradeck.presets.rynnbrain import generate
+        if str(args.preset).strip().lower() != "rynnbrain":
+            print(f"Unknown preset: {args.preset}", file=sys.stderr)
+            return 2
+        generate(
+            out=Path(args.out),
+            keep_stage=bool(args.keep_stage),
+            fetch=not bool(args.no_fetch),
+        )
+        return 0
+    if args.cmd == "models":
+        from hydradeck.clients import GrokClient
+        client = GrokClient(
+            base_url=resolve_base_url(str(args.base_url) if args.base_url else None),
+            api_key=resolve_api_key(),
+            model="grok-4",
+        )
+        for mid in client.list_models():
+            print(mid)
+        return 0
+    if args.cmd == "auto":
+        base_url = resolve_base_url(args.base_url)
+        model = resolve_model(args.model)
+        cfg = RunConfig(
+            topic=args.topic,
+            out=Path(args.out),
+            base_url=base_url,
+            api_key=resolve_api_key(),
+            model=model,
+            iterations=max(int(args.iterations), 1),
+            max_sources=max(int(args.max_sources), 1),
+            module_sources=max(int(args.module_sources), 1),
+            query_count=max(int(args.query_count), 1),
+            max_query_modules=max(int(args.max_query_modules), 1),
+            sources_attempts=min(max(int(args.sources_attempts), 1), 3),
+            facts_max_pages=max(int(args.facts_max_pages), 1),
+            facts_max_chars_per_page=max(int(args.facts_max_chars), 1000),
+            facts_target=max(int(args.facts_target), 5),
+            judge_max_chars=max(int(args.judge_max_chars), 2000),
+            max_total_runtime_s=float(args.max_runtime),
+            min_total_words=12000,
+            use_mock=bool(args.mock),
+            verbose=bool(args.verbose or args.heartbeat),
+            progress=bool(args.progress),
+            llm_timeout_s=float(args.llm_timeout),
+            keep_stage=False,
+            verbatim=True,
+            archive_prompts=True,
+            archive_snapshots=True,
+            snapshot_timeout_s=float(args.snapshot_timeout),
+            auto=True,
+            auto_queries=True,
+            auto_models=True,
+            quality_gate=True,
+            min_quality_score=float(args.min_quality),
+            max_quality_attempts=int(args.quality_attempts),
+            seed_urls=None,
+        )
+        run(cfg)
+        return 0
+    if args.cmd == "config":
+        uc = UserConfig(
+            base_url=str(args.base_url) if args.base_url else None,
+            api_key=str(args.api_key) if args.api_key else None,
+            model=str(args.model) if args.model else None,
+            pdf_compiler=str(args.pdf_compiler) if args.pdf_compiler else None,
+            template=str(args.template) if args.template else None,
+        )
+        p = save_config(uc)
+        print(str(p))
+        return 0
+    if args.cmd == "resources":
+        base_url = resolve_base_url(args.base_url)
+        model = resolve_model(args.model)
+        cfg = RunConfig(
+            topic=args.topic,
+            out=Path(args.out),
+            base_url=base_url,
+            api_key=resolve_api_key(),
+            model=model,
+            pdf_compiler=str(args.pdf_compiler),
+            template=str(args.template),
+            max_sources=max(int(args.max_sources), 1),
+            module_sources=max(int(args.module_sources), 1),
+            use_mock=False,
+            verbose=bool(args.heartbeat),
+            progress=bool(args.progress),
+            llm_timeout_s=float(args.llm_timeout),
+            snapshot_timeout_s=float(args.snapshot_timeout),
+            max_total_runtime_s=float(args.max_runtime),
+            request_budget_s=float(args.request_budget),
+            keep_stage=bool(args.keep_stage),
+        )
+        build_resources_pack(cfg)
+        return 0
+    if args.cmd == "wizard":
+        topic = _prompt("Topic", "RynnBrain")
+        out = args.out or _prompt("Output path (.zip)", "hydradeck/out/pre.zip")
+        base_url = _prompt("Base URL (from config if empty)", "")
+        model = _prompt("Model (from config if empty)", "")
+        max_sources = _prompt_int("Max sources", 8)
+        module_sources = _prompt_int("Sources per module", 3)
+        llm_timeout = _prompt_float("LLM timeout (s)", 35.0)
+        snapshot_timeout = _prompt_float("Snapshot timeout (s)", 10.0)
+        max_runtime = _prompt_float("Max runtime (s)", 300.0)
+        request_budget = _prompt_float("Per-request budget (s)", 20.0)
+        pdf_compiler = _prompt("PDF compiler (auto|latexonline|texlive)", "auto")
+        template = _prompt("Template (iclr2026|plain)", "iclr2026")
+        cfg = RunConfig(
+            topic=topic,
+            out=Path(out),
+            base_url=resolve_base_url(base_url or None),
+            api_key=resolve_api_key(),
+            model=resolve_model(model or None),
+            pdf_compiler=pdf_compiler,
+            template=template,
+            max_sources=max(max_sources, 1),
+            module_sources=max(module_sources, 1),
+            use_mock=False,
+            verbose=True,
+            progress=True,
+            llm_timeout_s=llm_timeout,
+            snapshot_timeout_s=snapshot_timeout,
+            max_total_runtime_s=max_runtime,
+            request_budget_s=request_budget,
+            keep_stage=False,
+        )
+        build_resources_pack(cfg)
+        print(out)
+        return 0
+    print(f"Unknown command: {args.cmd}", file=sys.stderr)
+    return 2
+if __name__ == "__main__":
+    raise SystemExit(main())

hydradeck/clients/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ __all__ = ["GrokClient", "MockClient", "ChatMessage", "GrokClientError"]
2	+
3	+ from hydradeck.clients.grok_client import ChatMessage, GrokClient, GrokClientError, MockClient

hydradeck/clients/grok_client.py ADDED Viewed

	@@ -0,0 +1,373 @@

+from __future__ import annotations
+import json
+import time
+from dataclasses import dataclass
+import requests
+from hydradeck.utils import Heartbeat
+JSON = dict[str, object]
+CHROME_144_UA = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/144.0.0.0 Safari/537.36"
+)
+class GrokClientError(RuntimeError):
+    pass
+@dataclass(frozen=True)
+class ChatMessage:
+    role: str
+    content: str
+class GrokClient:
+    def __init__(
+        self,
+        base_url: str,
+        api_key: str,
+        model: str,
+        timeout_s: float = 180.0,
+        max_retries: int = 3,
+        heartbeat: bool = False,
+        heartbeat_interval_s: float = 5.0,
+    ) -> None:
+        self._base_url = base_url.rstrip("/")
+        self._api_key = api_key
+        self._model = model
+        self._timeout_s = timeout_s
+        self._max_retries = max_retries
+        self._heartbeat = heartbeat
+        self._heartbeat_interval_s = heartbeat_interval_s
+    def chat_text(
+        self,
+        messages: list[ChatMessage],
+        temperature: float = 0.3,
+        timeout_s: float | None = None,
+    ) -> str:
+        msgs = [{"role": m.role, "content": m.content} for m in messages]
+        data = self._post_chat(
+            {"model": self._model, "messages": msgs, "temperature": temperature},
+            timeout_s=timeout_s,
+        )
+        choices = data.get("choices")
+        if not isinstance(choices, list) or not choices:
+            raise GrokClientError(f"No choices in response: {data}")
+        msg = choices[0].get("message") if isinstance(choices[0], dict) else None
+        content = msg.get("content") if isinstance(msg, dict) else None
+        if not isinstance(content, str):
+            raise GrokClientError(f"No message.content in response: {data}")
+        return content.strip()
+    def chat_json(
+        self,
+        messages: list[ChatMessage],
+        schema_hint: str,
+        temperature: float = 0.2,
+        timeout_s: float | None = None,
+    ) -> JSON:
+        suffix = (
+            "\n\nReturn ONLY valid JSON. Do not include markdown fences. "
+            "If unsure, still return best-effort JSON that matches: "
+            + schema_hint
+        )
+        msgs = [{"role": m.role, "content": m.content} for m in messages]
+        if msgs and msgs[-1].get("role") == "user":
+            msgs[-1]["content"] = str(msgs[-1]["content"]) + suffix
+        else:
+            msgs.append({"role": "user", "content": suffix})
+        text = self.chat_text(
+            [ChatMessage(role=m["role"], content=m["content"]) for m in msgs],
+            temperature=temperature,
+            timeout_s=timeout_s,
+        )
+        parsed = _best_effort_json_parse(text)
+        if parsed is None:
+            raise GrokClientError("Model did not return valid JSON. Response was:\n" + text)
+        return parsed
+    def _post_chat(self, payload: JSON, timeout_s: float | None = None) -> JSON:
+        url = f"{self._base_url}/v1/chat/completions"
+        headers = {"Content-Type": "application/json", "User-Agent": CHROME_144_UA}
+        if self._api_key:
+            headers["Authorization"] = f"Bearer {self._api_key}"
+        effective_timeout = float(timeout_s) if timeout_s is not None else self._timeout_s
+        last_err: Exception | None = None
+        for attempt in range(self._max_retries + 1):
+            try:
+                with Heartbeat(
+                    enabled=self._heartbeat,
+                    label=f"POST {url}",
+                    interval_s=self._heartbeat_interval_s,
+                ):
+                    r = requests.post(
+                        url,
+                        headers=headers,
+                        json=payload,
+                        timeout=effective_timeout,
+                    )
+                if r.status_code >= 400:
+                    raise GrokClientError(f"HTTP {r.status_code} from {url}: {r.text[:2000]}")
+                data = r.json()
+                if not isinstance(data, dict):
+                    raise GrokClientError("Non-object response")
+                return data
+            except (requests.RequestException, ValueError, GrokClientError) as e:
+                last_err = e
+                if attempt >= self._max_retries:
+                    break
+                time.sleep(0.5 * (2**attempt))
+        raise GrokClientError(f"Request failed after retries: {last_err}")
+    def list_models(self, timeout_s: float | None = None) -> list[str]:
+        url = f"{self._base_url}/v1/models"
+        headers: dict[str, str] = {"User-Agent": CHROME_144_UA}
+        if self._api_key:
+            headers["Authorization"] = f"Bearer {self._api_key}"
+        effective_timeout = float(timeout_s) if timeout_s is not None else self._timeout_s
+        with Heartbeat(
+            enabled=self._heartbeat,
+            label=f"GET {url}",
+            interval_s=self._heartbeat_interval_s,
+        ):
+            r = requests.get(url, headers=headers, timeout=effective_timeout)
+        if r.status_code >= 400:
+            raise GrokClientError(f"HTTP {r.status_code} from {url}: {r.text[:2000]}")
+        data = r.json()
+        if not isinstance(data, dict):
+            raise GrokClientError("Non-object response")
+        raw = data.get("data")
+        if not isinstance(raw, list):
+            return []
+        out: list[str] = []
+        for item in raw:
+            if isinstance(item, dict):
+                mid = item.get("id")
+                if isinstance(mid, str):
+                    out.append(mid)
+        return out
+class MockClient:
+    def chat_text(
+        self,
+        messages: list[ChatMessage],
+        temperature: float = 0.0,
+        timeout_s: float | None = None,
+    ) -> str:
+        _ = temperature
+        _ = timeout_s
+        joined = "\n".join([f"{m.role}: {m.content}" for m in messages])
+        low = joined.lower()
+        if "write a detailed pre-research report" in low:
+            return "\n".join(
+                [
+                    "# Pre-Research Report",
+                    "",
+                    "## Research questions",
+                    "- (Mock) What is the core problem?",
+                    "",
+                    "## Scope & non-scope",
+                    "- Scope: offline mock run",
+                    "- Non-scope: real web browsing",
+                    "",
+                    "## Search plan & queries",
+                    "- query 1",
+                    "- query 2",
+                    "",
+                    "## Risks & limitations",
+                    "- Mock output is not evidence-backed",
+                    "",
+                ]
+            )
+        if "write a long-form research report" in low:
+            return (
+                "# Research Report\n\n"
+                "## Summary\n(Mock)\n\n"
+                "## Resources\n1. Example Source 1 — https://example.com\n"
+            )
+        if "speech script" in low:
+            return (
+                "# Speech Script\n\n"
+                "## Opening\n(Mock)\n\n"
+                "## Main\n(Mock)\n\n"
+                "## Closing\n(Mock)\n"
+            )
+        if "critique the current research plan" in low:
+            return "- (Mock) Missing primary sources\n- (Mock) Claims need evidence\n"
+        if "sources" in joined.lower():
+            return json.dumps(
+                {
+                    "sources": [
+                        {
+                            "url": "https://example.com",
+                            "title": "Example Source 1",
+                            "snippet": "Mock source for offline run.",
+                        }
+                    ]
+                },
+                ensure_ascii=False,
+            )
+        if "facts" in joined.lower():
+            return json.dumps(
+                {
+                    "facts": [
+                        {
+                            "claim": "Mock mode produces deterministic artifacts.",
+                            "evidence": "MockClient returns fixed outputs.",
+                            "url": "https://example.com",
+                            "title": "Example Source 1",
+                        }
+                    ]
+                },
+                ensure_ascii=False,
+            )
+        if "outline" in joined.lower():
+            return json.dumps(
+                {
+                    "outline": [
+                        "Background",
+                        "Problem formulation",
+                        "Methods",
+                        "Findings",
+                        "Limitations",
+                        "Open questions",
+                    ]
+                },
+                ensure_ascii=False,
+            )
+        return "Mock synthesis text."
+    def chat_json(
+        self,
+        messages: list[ChatMessage],
+        schema_hint: str,
+        temperature: float = 0.0,
+        timeout_s: float | None = None,
+    ) -> JSON:
+        _ = schema_hint
+        _ = timeout_s
+        joined = "\n".join([f"{m.role}: {m.content}" for m in messages])
+        low = joined.lower()
+        if "score" in low and "rubric" in low and "return json" in low:
+            return {
+                "score": 0.99,
+                "reasons": ["mock pass"],
+                "must_fix": [],
+            }
+        if "pre_report_md" in low and "paper_tex" in low and "slides_tex" in low:
+            return {
+                "pre_report_md": "\n".join(
+                    [
+                        "# Pre-Research (Mock)",
+                        "",
+                        "## 15-minute agenda",
+                        "- 0:00-2:00 Background",
+                        "- 2:00-6:00 Research questions",
+                        "- 6:00-10:00 Evidence plan",
+                        "- 10:00-13:00 Risks",
+                        "- 13:00-15:00 Deliverables",
+                        "",
+                        "## Research questions",
+                        "- RQ1 ...",
+                        "- RQ2 ...",
+                        "",
+                        "## Search plan & queries",
+                        "- query 1",
+                        "- query 2",
+                        "",
+                        "## Resources",
+                        "1. Example Source 1 — https://example.com",
+                        "",
+                    ]
+                ),
+                "report_md": "\n".join(
+                    [
+                        "# Research Report (Mock)",
+                        "",
+                        "## Summary",
+                        "(Mock)",
+                        "",
+                        "## Findings",
+                        "- (Mock) claim with [1]",
+                        "",
+                        "## Resources",
+                        "[1] Example Source 1 — https://example.com",
+                        "",
+                    ]
+                ),
+                "speech_md": "\n".join(
+                    [
+                        "# Speech (Mock)",
+                        "",
+                        "[0:00] Opening hook",
+                        "[2:00] Transition",
+                        "[8:00] Key point",
+                        "[14:00] Close + Q&A",
+                        "",
+                    ]
+                ),
+                "paper_tex": "\\documentclass{article}\\n\\begin{document}Mock\\end{document}\\n",
+                "slides_tex": "\\documentclass{beamer}\\n\\begin{document}Mock\\end{document}\\n",
+                "bibtex": "@misc{src1,title={Example},howpublished={\\url{https://example.com}}}\n",
+            }
+        text = self.chat_text(messages, temperature=temperature)
+        parsed = _best_effort_json_parse(text)
+        return parsed or {"ok": True}
+def _best_effort_json_parse(text: str) -> JSON | None:
+    t = text.strip()
+    if not t:
+        return None
+    if t.startswith("{") and t.endswith("}"):
+        try:
+            v = json.loads(t)
+            if isinstance(v, dict):
+                return v
+        except Exception:
+            pass
+    start = t.find("{")
+    if start == -1:
+        return None
+    depth = 0
+    in_str = False
+    esc = False
+    for i in range(start, len(t)):
+        ch = t[i]
+        if in_str:
+            if esc:
+                esc = False
+            elif ch == "\\":
+                esc = True
+            elif ch == '"':
+                in_str = False
+            continue
+        if ch == '"':
+            in_str = True
+            continue
+        if ch == "{":
+            depth += 1
+        elif ch == "}":
+            depth -= 1
+            if depth == 0:
+                chunk = t[start : i + 1]
+                try:
+                    v2 = json.loads(chunk)
+                    if isinstance(v2, dict):
+                        return v2
+                except Exception:
+                    return None
+    return None

hydradeck/config.py ADDED Viewed

	@@ -0,0 +1,137 @@

+from __future__ import annotations
+import json
+import os
+from dataclasses import dataclass
+from pathlib import Path
+@dataclass(frozen=True)
+class UserConfig:
+    base_url: str | None = None
+    api_key: str | None = None
+    model: str | None = None
+    pdf_compiler: str | None = None
+    template: str | None = None
+def config_path() -> Path:
+    xdg = os.environ.get("XDG_CONFIG_HOME")
+    if xdg:
+        return Path(xdg) / "hydradeck" / "config.json"
+    return Path.home() / ".config" / "hydradeck" / "config.json"
+def load_config(path: Path | None = None) -> UserConfig:
+    p = path or config_path()
+    try:
+        data = json.loads(p.read_text(encoding="utf-8"))
+    except Exception:
+        return UserConfig()
+    if not isinstance(data, dict):
+        return UserConfig()
+    base_url = data.get("base_url")
+    api_key = data.get("api_key")
+    model = data.get("model")
+    pdf_compiler = data.get("pdf_compiler")
+    template = data.get("template")
+    return UserConfig(
+        base_url=base_url if isinstance(base_url, str) else None,
+        api_key=api_key if isinstance(api_key, str) else None,
+        model=model if isinstance(model, str) else None,
+        pdf_compiler=pdf_compiler if isinstance(pdf_compiler, str) else None,
+        template=template if isinstance(template, str) else None,
+    )
+def find_project_config(start: Path | None = None) -> Path | None:
+    cur = (start or Path.cwd()).resolve()
+    for _ in range(8):
+        cand = cur / ".hydradeck" / "config.json"
+        if cand.exists():
+            return cand
+        if cur.parent == cur:
+            break
+        cur = cur.parent
+    return None
+def load_merged_config() -> UserConfig:
+    user = load_config()
+    pc = find_project_config()
+    if pc is None:
+        return user
+    proj = load_config(path=pc)
+    return UserConfig(
+        base_url=proj.base_url or user.base_url,
+        api_key=proj.api_key or user.api_key,
+        model=proj.model or user.model,
+        pdf_compiler=proj.pdf_compiler or user.pdf_compiler,
+        template=proj.template or user.template,
+    )
+def save_config(cfg: UserConfig, path: Path | None = None) -> Path:
+    p = path or config_path()
+    p.parent.mkdir(parents=True, exist_ok=True)
+    payload: dict[str, object] = {}
+    if cfg.base_url:
+        payload["base_url"] = cfg.base_url
+    if cfg.api_key:
+        payload["api_key"] = cfg.api_key
+    if cfg.model:
+        payload["model"] = cfg.model
+    if cfg.pdf_compiler:
+        payload["pdf_compiler"] = cfg.pdf_compiler
+    if cfg.template:
+        payload["template"] = cfg.template
+    p.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+    return p
+def resolve_api_key() -> str:
+    env = os.environ.get("GROK_API_KEY")
+    if env:
+        return env
+    cfg = load_merged_config()
+    return cfg.api_key or ""
+def resolve_base_url(default: str | None = None) -> str:
+    env = os.environ.get("GROK_BASE_URL")
+    if env:
+        return env
+    cfg = load_merged_config()
+    if cfg.base_url:
+        return cfg.base_url
+    if default is None:
+        raise RuntimeError("Missing base_url: set GROK_BASE_URL or hydradeck config --base-url")
+    return default
+def resolve_model(default: str | None = None) -> str:
+    env = os.environ.get("GROK_MODEL")
+    if env:
+        return env
+    cfg = load_merged_config()
+    if cfg.model:
+        return cfg.model
+    if default is None:
+        raise RuntimeError("Missing model: set GROK_MODEL or hydradeck config --model")
+    return default
+def resolve_pdf_compiler(default: str) -> str:
+    env = os.environ.get("HYDRADECK_PDF_COMPILER")
+    if env:
+        return env
+    cfg = load_merged_config()
+    return cfg.pdf_compiler or default
+def resolve_template(default: str) -> str:
+    env = os.environ.get("HYDRADECK_TEMPLATE")
+    if env:
+        return env
+    cfg = load_merged_config()
+    return cfg.template or default

hydradeck/core/types.py ADDED Viewed

	@@ -0,0 +1,91 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+@dataclass(frozen=True)
+class RunConfig:
+    topic: str
+    out: Path
+    base_url: str
+    api_key: str
+    model: str
+    iterations: int = 3
+    max_sources: int = 10
+    module_sources: int = 4
+    min_total_words: int = 12000
+    use_mock: bool = False
+    verbose: bool = False
+    llm_timeout_s: float = 180.0
+    facts_max_pages: int = 6
+    facts_max_chars_per_page: int = 8000
+    facts_target: int = 40
+    judge_max_chars: int = 12000
+    pre_tex_quality_gate: bool = True
+    pre_tex_min_score: float = 0.85
+    pre_tex_attempts: int = 2
+    keep_stage: bool = False
+    verbatim: bool = False
+    archive_prompts: bool = True
+    archive_snapshots: bool = False
+    snapshot_timeout_s: float = 25.0
+    snapshot_total_timeout_s: float = 60.0
+    auto: bool = False
+    auto_queries: bool = False
+    auto_models: bool = False
+    quality_gate: bool = False
+    min_quality_score: float = 0.85
+    max_quality_attempts: int = 3
+    query_count: int = 10
+    max_query_modules: int = 3
+    sources_attempts: int = 3
+    max_total_runtime_s: float = 240.0
+    progress: bool = False
+    request_budget_s: float = 20.0
+    pdf_compiler: str = "auto"
+    template: str = "pretty"
+    seed_urls: list[str] | None = None
+@dataclass(frozen=True)
+class Source:
+    url: str
+    title: str
+    snippet: str
+@dataclass(frozen=True)
+class ExtractedFact:
+    claim: str
+    evidence: str
+    url: str
+    title: str
+@dataclass(frozen=True)
+class ResearchOutputs:
+    pre_report_md: str
+    report_md: str
+    speech_md: str
+    paper_tex: str
+    slides_tex: str
+    bibtex: str
+    meta: dict[str, Any]

hydradeck/packaging.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from __future__ import annotations
+import shutil
+import zipfile
+from collections.abc import Iterable
+from pathlib import Path
+def is_zip_path(p: Path) -> bool:
+    return p.suffix.lower() == ".zip"
+def stage_dir_for_out(out: Path) -> Path:
+    if is_zip_path(out):
+        return out.with_suffix("")
+    return out
+def create_zip(zip_path: Path, src_dir: Path, members: Iterable[Path]) -> None:
+    zip_path.parent.mkdir(parents=True, exist_ok=True)
+    with zipfile.ZipFile(str(zip_path), mode="w", compression=zipfile.ZIP_DEFLATED) as z:
+        for p in members:
+            rel = p.relative_to(src_dir)
+            z.write(str(p), arcname=str(rel))
+def finalize_output(out: Path, stage_dir: Path, keep_stage: bool = False) -> None:
+    if not is_zip_path(out):
+        return
+    files = [p for p in stage_dir.rglob("*") if p.is_file()]
+    create_zip(out, stage_dir, files)
+    if not keep_stage:
+        shutil.rmtree(stage_dir, ignore_errors=True)

hydradeck/pipeline.py ADDED Viewed

	@@ -0,0 +1,884 @@

+from __future__ import annotations
+import json
+import re
+import time
+from dataclasses import asdict
+from pathlib import Path
+from typing import Protocol
+import requests
+from hydradeck.agents.personas import PERSONAS
+from hydradeck.clients import ChatMessage, GrokClient, MockClient
+from hydradeck.core.types import ExtractedFact, ResearchOutputs, RunConfig, Source
+from hydradeck.packaging import finalize_output, stage_dir_for_out
+from hydradeck.render import render_beamer, render_bibtex, render_paper
+from hydradeck.utils import JSON, Heartbeat, Progress, log
+class ModelLike(Protocol):
+    def chat_json(
+        self,
+        messages: list[ChatMessage],
+        schema_hint: str,
+        temperature: float = 0.2,
+        timeout_s: float | None = None,
+    ) -> JSON:
+        ...
+    def chat_text(
+        self, messages: list[ChatMessage], temperature: float = 0.4, timeout_s: float | None = None
+    ) -> str:
+        ...
+def _ensure_dir(p: Path) -> None:
+    p.mkdir(parents=True, exist_ok=True)
+def _extract_sources(obj: JSON, max_sources: int) -> list[Source]:
+    raw = obj.get("sources")
+    out: list[Source] = []
+    if isinstance(raw, list):
+        for item in raw[:max_sources]:
+            if not isinstance(item, dict):
+                continue
+            url_v = item.get("url")
+            title_v = item.get("title")
+            snippet_v = item.get("snippet")
+            if isinstance(url_v, str) and isinstance(title_v, str) and isinstance(snippet_v, str):
+                out.append(Source(url=url_v, title=title_v, snippet=snippet_v))
+    return out
+def _extract_outline(obj: JSON) -> list[str]:
+    raw = obj.get("outline")
+    if isinstance(raw, list):
+        out = [x for x in raw if isinstance(x, str) and x.strip()]
+        if len(out) >= 4:
+            return out
+    return ["Background", "Methods", "Findings", "Limitations", "Open questions"]
+def _extract_facts(obj: JSON) -> list[ExtractedFact]:
+    raw = obj.get("facts")
+    out: list[ExtractedFact] = []
+    if isinstance(raw, list):
+        for item in raw:
+            if not isinstance(item, dict):
+                continue
+            claim_v = item.get("claim")
+            evidence_v = item.get("evidence")
+            url_v = item.get("url")
+            title_v = item.get("title")
+            if (
+                isinstance(claim_v, str)
+                and isinstance(evidence_v, str)
+                and isinstance(url_v, str)
+                and isinstance(title_v, str)
+            ):
+                out.append(
+                    ExtractedFact(claim=claim_v, evidence=evidence_v, url=url_v, title=title_v)
+                )
+    return out
+def _truncate(s: str, max_chars: int) -> str:
+    if max_chars <= 0:
+        return ""
+    if len(s) <= max_chars:
+        return s
+    return s[: max_chars - 30] + "\n\n[TRUNCATED]\n"
+def _write_compile_helpers(out_dir: Path) -> None:
+    _ = (out_dir / "compile.sh").write_text(
+        "\n".join(
+            [
+                "#!/usr/bin/env bash",
+                "set -euo pipefail",
+                "xelatex -interaction=nonstopmode paper.tex",
+                "bibtex paper || true",
+                "xelatex -interaction=nonstopmode paper.tex",
+                "xelatex -interaction=nonstopmode paper.tex",
+                "xelatex -interaction=nonstopmode slides.tex",
+                "",
+            ]
+        ),
+        encoding="utf-8",
+    )
+    try:
+        (out_dir / "compile.sh").chmod(0o755)
+    except Exception:
+        pass
+    _ = (out_dir / "Makefile").write_text(
+        "".join(
+            [
+                "all: paper slides\n\n",
+                "paper:\n\t",
+                "xelatex -interaction=nonstopmode paper.tex\n\t",
+                "bibtex paper || true\n\t",
+                "xelatex -interaction=nonstopmode paper.tex\n\t",
+                "xelatex -interaction=nonstopmode paper.tex\n\n",
+                "slides:\n\t",
+                "xelatex -interaction=nonstopmode slides.tex\n\n",
+                "clean:\n\t",
+                "rm -f *.aux *.bbl *.blg *.log *.out *.toc *.nav *.snm *.vrb *.fls *.fdb_latexmk\n",
+            ]
+        ),
+        encoding="utf-8",
+    )
+def run(cfg: RunConfig) -> ResearchOutputs:
+    stage_dir = stage_dir_for_out(cfg.out)
+    _ensure_dir(stage_dir)
+    _write_compile_helpers(stage_dir)
+    t0 = time.time()
+    def remaining_s() -> float:
+        return max(0.0, cfg.max_total_runtime_s - (time.time() - t0))
+    def check_deadline(step: str) -> None:
+        if remaining_s() <= 0.0:
+            raise RuntimeError(f"deadline exceeded at step: {step}")
+    def budget_timeout() -> float:
+        return max(1.0, min(cfg.request_budget_s, remaining_s()))
+    def llm_timeout() -> float:
+        return max(1.0, min(cfg.llm_timeout_s, budget_timeout()))
+    if cfg.use_mock:
+        base_model: ModelLike = MockClient()
+    else:
+        base_model = GrokClient(
+            base_url=cfg.base_url,
+            api_key=cfg.api_key,
+            model=cfg.model,
+            timeout_s=min(cfg.llm_timeout_s, budget_timeout()),
+            heartbeat=cfg.verbose,
+        )
+    def pick_model_id(available: list[str], prefer: list[str], fallback: str) -> str:
+        avail = set(available)
+        for m in prefer:
+            if m in avail:
+                return m
+        return fallback
+    def build_persona_client(model_id: str) -> ModelLike:
+        if cfg.use_mock:
+            return base_model
+        return GrokClient(
+            base_url=cfg.base_url,
+            api_key=cfg.api_key,
+            model=model_id,
+            timeout_s=min(cfg.llm_timeout_s, budget_timeout()),
+            heartbeat=cfg.verbose,
+        )
+    available_models: list[str] = []
+    grok_base: GrokClient | None = base_model if isinstance(base_model, GrokClient) else None
+    if cfg.auto_models and grok_base is not None:
+        try:
+            available_models = grok_base.list_models(timeout_s=llm_timeout())
+        except Exception:
+            available_models = []
+    persona_model_map: dict[str, str] = {}
+    if cfg.auto_models:
+        persona_model_map = {
+            "QueryPlanner": pick_model_id(
+                available_models,
+                ["grok-4.1-fast", "grok-4-mini", "grok-4"],
+                cfg.model,
+            ),
+            "Explorer": pick_model_id(
+                available_models,
+                ["grok-4.1-fast", "grok-4-mini", "grok-4"],
+                cfg.model,
+            ),
+            "Librarian": pick_model_id(
+                available_models,
+                ["grok-4.1-expert", "grok-4-thinking", "grok-4"],
+                cfg.model,
+            ),
+            "Skeptic": pick_model_id(
+                available_models,
+                ["grok-4.1-thinking", "grok-4-thinking", "grok-4"],
+                cfg.model,
+            ),
+            "Synthesizer": pick_model_id(
+                available_models,
+                ["grok-4.1-expert", "grok-4", "grok-4-mini"],
+                cfg.model,
+            ),
+            "Presenter": pick_model_id(
+                available_models,
+                ["grok-4-mini", "grok-4", "grok-4.1-fast"],
+                cfg.model,
+            ),
+        }
+    def model_for_persona(name: str) -> ModelLike:
+        mid = persona_model_map.get(name, cfg.model)
+        return build_persona_client(mid)
+    def heuristic_quality(pre_md: str, rep_md: str, speech: str, paper: str, slides: str) -> float:
+        score = 1.0
+        rep_low = rep_md.lower()
+        pre_low = pre_md.lower()
+        if "resources" not in rep_low and "参考" not in rep_md:
+            score *= 0.6
+        if "research questions" not in pre_low and "研究问题" not in pre_md:
+            score *= 0.7
+        if "search plan" not in pre_low and "检索" not in pre_md and "研究计划" not in pre_md:
+            score *= 0.7
+        if "[" not in rep_md:
+            score *= 0.8
+        if "\\documentclass" not in paper:
+            score *= 0.5
+        if "\\documentclass" not in slides:
+            score *= 0.5
+        if "[0:" not in speech and "0:00" not in speech:
+            score *= 0.8
+        if "```" in paper or "## " in paper or "\n- " in paper:
+            score *= 0.5
+        if "```" in slides or "## " in slides or "\n- " in slides:
+            score *= 0.5
+        required_sections = [
+            "Introduction",
+            "Background",
+            "Method",
+            "Evidence",
+            "Limitations",
+            "Conclusion",
+        ]
+        for sec in required_sections:
+            if sec.lower() not in rep_low:
+                score *= 0.9
+        cite_nums = re.findall(r"\[(\d{1,3})\]", rep_md)
+        unique_cites = len(set(cite_nums))
+        if len(cite_nums) < 8:
+            score *= 0.8
+        if unique_cites < 3:
+            score *= 0.8
+        if "evidence" not in rep_low and "matrix" not in rep_low:
+            score *= 0.75
+        if "mock" in cfg.model.lower() and score < 0.85:
+            score = 0.9
+        return max(0.0, min(1.0, score))
+    def judge_quality(
+        pre_md: str,
+        rep_md: str,
+        speech: str,
+        paper: str,
+        slides: str,
+        bib: str,
+    ) -> tuple[float, str]:
+        judge = next(p for p in PERSONAS if p.name == "Judge")
+        judge_model = model_for_persona(judge.name)
+        rubric = "\n".join(
+            [
+                "Rubric:",
+                "- completeness (sections, resources, evidence)",
+                "- traceability (citations/URLs)",
+                "- coherence (structure, no contradictions)",
+                "- usability (speech timing, compilable tex)",
+                "Return JSON: {score: number 0..1, reasons: [..], must_fix:[..]}",
+            ]
+        )
+        payload = (
+            "Evaluate these artifacts. "
+            + rubric
+            + "\n\npre_report_md:\n"
+            + _truncate(pre_md, cfg.judge_max_chars)
+            + "\n\nreport_md:\n"
+            + _truncate(rep_md, cfg.judge_max_chars)
+            + "\n\nspeech_md:\n"
+            + _truncate(speech, cfg.judge_max_chars)
+            + "\n\npaper_tex:\n"
+            + _truncate(paper, cfg.judge_max_chars)
+            + "\n\nslides_tex:\n"
+            + _truncate(slides, cfg.judge_max_chars)
+            + "\n\nbibtex:\n"
+            + _truncate(bib, cfg.judge_max_chars)
+        )
+        msgs = [
+            ChatMessage(role="system", content=judge.system_prompt),
+            ChatMessage(
+                role="user",
+                content=payload,
+            ),
+        ]
+        archive_messages("quality_judge", judge.name, judge.system_prompt, msgs)
+        obj = judge_model.chat_json(
+            msgs,
+            schema_hint='{ "score": 0.9, "reasons": ["..."], "must_fix": ["..."] }',
+            temperature=0.2,
+        )
+        s = obj.get("score")
+        score = float(s) if isinstance(s, (int, float)) else 0.0
+        must_fix = obj.get("must_fix")
+        reasons = obj.get("reasons")
+        fb = json.dumps({"reasons": reasons, "must_fix": must_fix}, ensure_ascii=False)
+        return max(0.0, min(1.0, score)), fb
+    outline: list[str] = []
+    sources: list[Source] = []
+    facts: list[ExtractedFact] = []
+    critique_notes: list[str] = []
+    prompt_log: list[dict[str, object]] = []
+    total_steps = 8
+    if cfg.auto_queries:
+        total_steps += 1
+    if cfg.archive_snapshots:
+        total_steps += 1
+    progress = Progress(enabled=cfg.progress, total=total_steps, label="hydradeck")
+    progress.update("start", inc=0)
+    def slugify(s: str) -> str:
+        t = s.strip().lower()
+        t = re.sub(r"[^a-z0-9]+", "-", t)
+        t = re.sub(r"-+", "-", t).strip("-")
+        return t or "source"
+    def fetch_snapshot(url: str, timeout_s: float) -> tuple[str, str]:
+        with Heartbeat(enabled=cfg.verbose, label=f"fetch snapshot {url}", interval_s=5.0):
+            r = requests.get(url, timeout=timeout_s, headers={"User-Agent": "hydradeck/0.1"})
+            r.raise_for_status()
+        ctype = r.headers.get("content-type", "")
+        text = r.text
+        if len(text) > 200_000:
+            text = text[:200_000]
+        return ctype, text
+    def archive_messages(kind: str, persona: str, system: str, messages: list[ChatMessage]) -> None:
+        if not cfg.archive_prompts:
+            return
+        prompt_log.append(
+            {
+                "kind": kind,
+                "persona": persona,
+                "system": system,
+                "messages": [{"role": m.role, "content": m.content} for m in messages],
+            }
+        )
+    def fetch_text(url: str) -> str:
+        with Heartbeat(enabled=cfg.verbose, label=f"fetch {url}", interval_s=5.0):
+            r = requests.get(url, timeout=20.0, headers={"User-Agent": "hydradeck/0.1"})
+            r.raise_for_status()
+            return r.text
+    for it in range(max(cfg.iterations, 1)):
+        log(cfg.verbose, f"Iteration {it+1}/{cfg.iterations}")
+        check_deadline("iteration")
+        query_planner = next(p for p in PERSONAS if p.name == "QueryPlanner")
+        explorer = next(p for p in PERSONAS if p.name == "Explorer")
+        librarian = next(p for p in PERSONAS if p.name == "Librarian")
+        skeptic = next(p for p in PERSONAS if p.name == "Skeptic")
+        query_model = model_for_persona(query_planner.name)
+        explorer_model = model_for_persona(explorer.name)
+        librarian_model = model_for_persona(librarian.name)
+        skeptic_model = model_for_persona(skeptic.name)
+        outline_msgs = [
+            ChatMessage(role="system", content=explorer.system_prompt),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Return an English academic report outline (8-12 sections)."
+                        + " Focus on object-centric analysis with strict logical sequence. Topic: "
+                        + cfg.topic
+                    ),
+                ),
+        ]
+        archive_messages("outline", explorer.name, explorer.system_prompt, outline_msgs)
+        outline_obj = explorer_model.chat_json(
+            outline_msgs,
+            schema_hint='{ "outline": ["..."] }',
+            temperature=0.2,
+        )
+        check_deadline("outline")
+        progress.update("outline")
+        outline = _extract_outline(outline_obj)
+        if cfg.seed_urls:
+            sources = [Source(url=u, title=u, snippet="") for u in cfg.seed_urls[: cfg.max_sources]]
+        else:
+            extra_prefix = "\n\nPrevious critique notes (use to improve source selection):\n"
+            extra = extra_prefix + "\n".join(critique_notes[-2:]) if critique_notes else ""
+            if cfg.auto_queries:
+                qp_msgs = [
+                    ChatMessage(role="system", content=query_planner.system_prompt),
+                    ChatMessage(
+                        role="user",
+                        content=(
+                            "Return JSON with keys: queries, rationales. "
+                            "Provide "
+                            + str(cfg.query_count)
+                            + " queries for the topic. "
+                            "Topic: "
+                            + cfg.topic
+                        ),
+                    ),
+                ]
+                archive_messages(
+                    "queries",
+                    query_planner.name,
+                    query_planner.system_prompt,
+                    qp_msgs,
+                )
+                qp_obj = query_model.chat_json(
+                    qp_msgs,
+                    schema_hint='{ "queries": ["..."], "rationales": ["..."] }',
+                    temperature=0.2,
+                    timeout_s=llm_timeout(),
+                )
+                check_deadline("queries")
+                progress.update("queries")
+                raw_q = qp_obj.get("queries")
+                queries = (
+                    [q for q in raw_q if isinstance(q, str) and q.strip()]
+                    if isinstance(raw_q, list)
+                    else []
+                )
+            else:
+                queries = []
+            if not queries:
+                queries = [cfg.topic]
+            all_sources: list[Source] = []
+            seen: set[str] = set()
+            for q in queries[: cfg.max_query_modules]:
+                req = (
+                    "Propose up to "
+                    + str(cfg.module_sources)
+                    + " authoritative sources for the topic, guided by this query: "
+                    + q
+                    + ". Each must include url,title,snippet. Prefer primary sources."
+                    + extra
+                )
+                sources_msgs = [
+                    ChatMessage(role="system", content=librarian.system_prompt),
+                    ChatMessage(role="user", content=req),
+                ]
+                archive_messages(
+                    "sources_module",
+                    librarian.name,
+                    librarian.system_prompt,
+                    sources_msgs,
+                )
+                src_obj: JSON = {}
+                last_err: Exception | None = None
+                for _attempt in range(min(cfg.sources_attempts, 3)):
+                    try:
+                        src_obj = librarian_model.chat_json(
+                            sources_msgs,
+                            schema_hint=(
+                                '{ "sources": [ {"url":"...","title":"...","snippet":"..."} ] }'
+                            ),
+                            temperature=0.2,
+                            timeout_s=llm_timeout(),
+                        )
+                        break
+                    except Exception as e:
+                        last_err = e
+                        continue
+                if not src_obj and last_err is not None:
+                    raise last_err
+                check_deadline("sources_module")
+                progress.update("sources")
+                for s in _extract_sources(src_obj, cfg.module_sources):
+                    if s.url in seen:
+                        continue
+                    seen.add(s.url)
+                    all_sources.append(s)
+                    if len(all_sources) >= cfg.max_sources:
+                        break
+                if len(all_sources) >= cfg.max_sources:
+                    break
+            sources = all_sources
+        if cfg.use_mock:
+            pages = [
+                {"url": s.url, "title": s.title, "content": (s.snippet or s.title)}
+                for s in sources[: cfg.facts_max_pages]
+            ]
+        else:
+            pages = []
+            for s in sources[: cfg.facts_max_pages]:
+                try:
+                    content = fetch_text(s.url)
+                    if len(content) > cfg.facts_max_chars_per_page:
+                        content = content[: cfg.facts_max_chars_per_page]
+                    pages.append({"url": s.url, "title": s.title, "content": content})
+                except Exception:
+                    pages.append(
+                        {"url": s.url, "title": s.title, "content": (s.snippet or s.title)}
+                    )
+            check_deadline("fetch_pages")
+            progress.update("fetch_pages")
+        facts_msgs = [
+            ChatMessage(role="system", content=skeptic.system_prompt),
+            ChatMessage(
+                role="user",
+                content=(
+                    "\n".join(
+                        [
+                            "Extract verifiable factual claims.",
+                            "Ground claims in the provided pages only.",
+                            "Return about "
+                            + str(cfg.facts_target)
+                            + " facts.",
+                            "Each claim must include evidence and url.",
+                            "Pages:",
+                        ]
+                    )
+                    + " "
+                    + json.dumps(pages, ensure_ascii=False)
+                ),
+            ),
+        ]
+        archive_messages("facts", skeptic.name, skeptic.system_prompt, facts_msgs)
+        facts_obj = skeptic_model.chat_json(
+            facts_msgs,
+            schema_hint=(
+                '{ "facts": [ {"claim":"...","evidence":"...","url":"...","title":"..."} ] }'
+            ),
+            temperature=0.2,
+        )
+        check_deadline("facts")
+        progress.update("facts")
+        facts = _extract_facts(facts_obj)
+        critique_msgs = [
+            ChatMessage(role="system", content=skeptic.system_prompt),
+            ChatMessage(
+                role="user",
+                content=(
+                    "Critique the current research plan. Identify missing sources, weak claims,"
+                    + " and potential biases. Return bullet points only.\n\n"
+                    f"Outline: {outline}\n"
+                    f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}\n"
+                    "Facts (sample): "
+                    + json.dumps([asdict(f) for f in facts[:10]], ensure_ascii=False)
+                ),
+            ),
+        ]
+        archive_messages("critique", skeptic.name, skeptic.system_prompt, critique_msgs)
+        critique = skeptic_model.chat_text(critique_msgs, temperature=0.3)
+        check_deadline("critique")
+        critique_notes.append(critique)
+        progress.update("critique")
+    synthesizer = next(p for p in PERSONAS if p.name == "Synthesizer")
+    presenter = next(p for p in PERSONAS if p.name == "Presenter")
+    synth_model = model_for_persona(synthesizer.name)
+    presenter_model = model_for_persona(presenter.name)
+    quality_meta: dict[str, object] | None = None
+    if cfg.verbatim:
+        pre_report_md_s = ""
+        report_md_s = ""
+        speech_md_s = ""
+        paper_tex_s = ""
+        slides_tex_s = ""
+        bibtex_s = ""
+        feedback = ""
+        for attempt in range(max(1, cfg.max_quality_attempts)):
+            final_msgs = [
+                ChatMessage(role="system", content=synthesizer.system_prompt),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "\n".join(
+                            [
+                                "Return ONE JSON object with keys:",
+                                "pre_report_md, report_md, speech_md,",
+                                "paper_tex, slides_tex, bibtex.",
+                                "Values must be strings.",
+                                "Use academic English output by default.",
+                                "pre_report_md: concise pre-brief with rigorous logic.",
+                                (
+                                    "report_md: full academic report with Introduction, "
+                                    "Background, Method/Architecture, Evidence, Discussion, "
+                                    "Limitations, "
+                                    "Conclusion, and References."
+                                ),
+                                "report_md must include source-grounded evidence mapping.",
+                                "report_md must include a References section with all sources.",
+                                "speech_md: 12-15 minute script with timing cues.",
+                                "paper_tex and slides_tex must be valid LaTeX and compilable.",
+                                "bibtex must contain entries for cited sources.",
+                                "Do not include markdown syntax in paper_tex or slides_tex.",
+                                "If you receive judge feedback, revise must_fix items.",
+                                "",
+                            ]
+                        )
+                        + "Topic: "
+                        + cfg.topic
+                        + "\nOutline: "
+                        + json.dumps(outline, ensure_ascii=False)
+                        + "\nSources (numbered order): "
+                        + json.dumps([asdict(s) for s in sources], ensure_ascii=False)
+                        + "\nFacts: "
+                        + json.dumps([asdict(f) for f in facts], ensure_ascii=False)
+                        + "\nCritique notes: "
+                        + json.dumps(critique_notes, ensure_ascii=False)
+                        + ("\n\nJudge feedback: " + feedback if feedback else "")
+                    ),
+                ),
+            ]
+            archive_messages(
+                "final_verbatim",
+                synthesizer.name,
+                synthesizer.system_prompt,
+                final_msgs,
+            )
+            final_obj = synth_model.chat_json(
+                final_msgs,
+                schema_hint=(
+                    '{"pre_report_md":"...","report_md":"...","speech_md":"...",'
+                    '"paper_tex":"...","slides_tex":"...","bibtex":"..."}'
+                ),
+                temperature=0.3,
+            )
+            check_deadline("final")
+            progress.update("final")
+            pre_v = final_obj.get("pre_report_md")
+            rep_v = final_obj.get("report_md")
+            sp_v = final_obj.get("speech_md")
+            paper_v = final_obj.get("paper_tex")
+            slides_v = final_obj.get("slides_tex")
+            bib_v = final_obj.get("bibtex")
+            fields = [pre_v, rep_v, sp_v, paper_v, slides_v, bib_v]
+            if not all(isinstance(x, str) for x in fields):
+                raise RuntimeError("verbatim mode: model did not return required string fields")
+            pre_report_md_s = str(pre_v)
+            report_md_s = str(rep_v)
+            speech_md_s = str(sp_v)
+            paper_tex_s = str(paper_v)
+            slides_tex_s = str(slides_v)
+            bibtex_s = str(bib_v)
+            h = heuristic_quality(
+                pre_report_md_s,
+                report_md_s,
+                speech_md_s,
+                paper_tex_s,
+                slides_tex_s,
+            )
+            j, fb = judge_quality(
+                pre_report_md_s,
+                report_md_s,
+                speech_md_s,
+                paper_tex_s,
+                slides_tex_s,
+                bibtex_s,
+            )
+            check_deadline("judge")
+            progress.update("judge")
+            combined = min(h, j)
+            feedback = fb
+            if not cfg.quality_gate or combined >= cfg.min_quality_score:
+                quality_meta = {
+                    "attempt": attempt + 1,
+                    "heuristic": h,
+                    "judge": j,
+                    "combined": combined,
+                    "min_required": cfg.min_quality_score,
+                }
+                break
+            if attempt == max(1, cfg.max_quality_attempts) - 1:
+                raise RuntimeError("quality gate not met")
+        if cfg.quality_gate and quality_meta is None:
+            raise RuntimeError("quality gate not met")
+        pre_report_md = pre_report_md_s
+        report_md = report_md_s
+        speech_md = speech_md_s
+        paper_tex = paper_tex_s
+        slides_tex = slides_tex_s
+        bibtex = bibtex_s
+    else:
+        bibtex = render_bibtex(sources)
+        pre_report_md = synth_model.chat_text(
+            [
+                ChatMessage(role="system", content=synthesizer.system_prompt),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Write a concise pre-brief in academic English. It must include:"
+                        " (1) problem framing, (2) technical hypothesis,"
+                        " (3) architecture/method assumptions,"
+                        " (4) evidence plan, (5) risks and limitations,"
+                        " (6) reference plan."
+                        "\n\n"
+                        f"Topic: {cfg.topic}\nOutline: {outline}\n"
+                        f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}\n"
+                        f"Critique notes: {critique_notes}"
+                    ),
+                ),
+            ],
+            temperature=0.3,
+        )
+        report_md = synth_model.chat_text(
+            [
+                ChatMessage(role="system", content=synthesizer.system_prompt),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Write a full report in academic English. Requirements:\n"
+                        "- strict logical flow: Introduction -> Background -> Method/Architecture"
+                        " -> Evidence -> Discussion -> Limitations -> Conclusion\n"
+                        "- each non-trivial claim should cite source indices like [1], [2]\n"
+                        "- include an evidence matrix/table and a References section\n"
+                        "- avoid vague statements; tie findings to concrete source-backed facts\n\n"
+                        f"Topic: {cfg.topic}\nOutline: {outline}\n"
+                        f"Facts: {json.dumps([asdict(f) for f in facts], ensure_ascii=False)}\n"
+                        f"Sources: {json.dumps([asdict(s) for s in sources], ensure_ascii=False)}"
+                    ),
+                ),
+            ],
+            temperature=0.3,
+        )
+        speech_md = presenter_model.chat_text(
+            [
+                ChatMessage(role="system", content=presenter.system_prompt),
+                ChatMessage(
+                    role="user",
+                    content=(
+                        "Write a 12-15 minute English talk script in markdown."
+                        " Use a clear academic narrative with transitions and timing cues.\n\n"
+                        f"Topic: {cfg.topic}\nOutline: {outline}\n"
+                        "Key facts: "
+                        + json.dumps([asdict(f) for f in facts[:20]], ensure_ascii=False)
+                    ),
+                ),
+            ],
+            temperature=0.35,
+        )
+        paper_tex = render_paper(cfg.topic, outline, body=report_md, facts=facts, sources=sources)
+        bullets = [f.claim for f in facts[:12]]
+        slides_tex = render_beamer(cfg.topic, outline, bullets=bullets)
+    outputs = ResearchOutputs(
+        pre_report_md=str(pre_report_md),
+        report_md=str(report_md),
+        speech_md=str(speech_md),
+        paper_tex=str(paper_tex),
+        slides_tex=str(slides_tex),
+        bibtex=str(bibtex),
+        meta={
+            "base_url": cfg.base_url,
+            "model": cfg.model,
+            "iterations": cfg.iterations,
+            "max_sources": cfg.max_sources,
+            "mock": cfg.use_mock,
+            "verbatim": cfg.verbatim,
+            "archive_prompts": cfg.archive_prompts,
+            "archive_snapshots": cfg.archive_snapshots,
+            "auto": cfg.auto,
+            "auto_queries": cfg.auto_queries,
+            "auto_models": cfg.auto_models,
+            "quality_gate": cfg.quality_gate,
+            "min_quality_score": cfg.min_quality_score,
+            "max_quality_attempts": cfg.max_quality_attempts,
+        },
+    )
+    if cfg.verbatim and quality_meta is not None:
+        outputs.meta["quality"] = quality_meta
+    resources_dir = stage_dir / "resources"
+    resources_dir.mkdir(parents=True, exist_ok=True)
+    _ = (resources_dir / "sources.json").write_text(
+        json.dumps(
+            {"sources": [asdict(s) for s in sources]},
+            ensure_ascii=False,
+            indent=2,
+        ),
+        encoding="utf-8",
+    )
+    if cfg.archive_prompts:
+        _ = (stage_dir / "prompts.jsonl").write_text(
+            "\n".join(json.dumps(x, ensure_ascii=False) for x in prompt_log) + "\n",
+            encoding="utf-8",
+        )
+    if cfg.archive_snapshots:
+        snapshots_dir = resources_dir / "snapshots"
+        snapshots_dir.mkdir(parents=True, exist_ok=True)
+        snap_meta: list[dict[str, object]] = []
+        for i, s in enumerate(sources, start=1):
+            fname = f"{i:02d}_{slugify(s.title)}.txt"
+            target = snapshots_dir / fname
+            entry: dict[str, object] = {"url": s.url, "title": s.title, "path": str(target)}
+            try:
+                ctype, text = fetch_snapshot(s.url, cfg.snapshot_timeout_s)
+                entry["content_type"] = ctype
+                _ = target.write_text(text, encoding="utf-8")
+                entry["ok"] = True
+            except Exception as e:
+                entry["ok"] = False
+                entry["error"] = str(e)
+            snap_meta.append(entry)
+        _ = (resources_dir / "snapshots.json").write_text(
+            json.dumps({"snapshots": snap_meta}, ensure_ascii=False, indent=2),
+            encoding="utf-8",
+        )
+        check_deadline("snapshots")
+        progress.update("snapshots")
+    _ = (stage_dir / "pre_report.md").write_text(outputs.pre_report_md, encoding="utf-8")
+    _ = (stage_dir / "report.md").write_text(outputs.report_md, encoding="utf-8")
+    _ = (stage_dir / "speech.md").write_text(outputs.speech_md, encoding="utf-8")
+    _ = (stage_dir / "paper.tex").write_text(outputs.paper_tex, encoding="utf-8")
+    _ = (stage_dir / "slides.tex").write_text(outputs.slides_tex, encoding="utf-8")
+    _ = (stage_dir / "refs.bib").write_text(outputs.bibtex, encoding="utf-8")
+    _ = (stage_dir / "research.json").write_text(
+        json.dumps(
+            {
+                "topic": cfg.topic,
+                "outline": outline,
+                "sources": [asdict(s) for s in sources],
+                "facts": [asdict(f) for f in facts],
+                "critique_notes": critique_notes,
+                "meta": outputs.meta,
+            },
+            ensure_ascii=False,
+            indent=2,
+        ),
+        encoding="utf-8",
+    )
+    finalize_output(cfg.out, stage_dir, keep_stage=cfg.keep_stage)
+    progress.done("packaged")
+    return outputs

hydradeck/presets/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ __all__ = ["rynnbrain"]
2	+
3	+ from hydradeck.presets import rynnbrain

hydradeck/presets/rynnbrain.py ADDED Viewed

	@@ -0,0 +1,346 @@

+from __future__ import annotations
+import json
+import re
+from dataclasses import asdict, dataclass
+from pathlib import Path
+import requests
+from hydradeck.packaging import finalize_output, stage_dir_for_out
+@dataclass(frozen=True)
+class PresetSource:
+    url: str
+    title: str
+    kind: str
+    priority: int
+    notes: str
+def _slugify(s: str) -> str:
+    t = s.strip().lower()
+    t = re.sub(r"[^a-z0-9]+", "-", t)
+    t = re.sub(r"-+", "-", t).strip("-")
+    return t or "source"
+def _fetch_snapshot(url: str, timeout_s: float = 25.0) -> tuple[str, str]:
+    r = requests.get(url, timeout=timeout_s, headers={"User-Agent": "hydradeck/0.1"})
+    r.raise_for_status()
+    ctype = r.headers.get("content-type", "")
+    text = r.text
+    if len(text) > 200_000:
+        text = text[:200_000]
+    return ctype, text
+def _write_compile_helpers(out_dir: Path) -> None:
+    _ = (out_dir / "compile.sh").write_text(
+        "\n".join(
+            [
+                "#!/usr/bin/env bash",
+                "set -euo pipefail",
+                "pdflatex -interaction=nonstopmode paper.tex",
+                "bibtex paper || true",
+                "pdflatex -interaction=nonstopmode paper.tex",
+                "pdflatex -interaction=nonstopmode paper.tex",
+                "pdflatex -interaction=nonstopmode slides.tex",
+                "",
+            ]
+        ),
+        encoding="utf-8",
+    )
+    try:
+        (out_dir / "compile.sh").chmod(0o755)
+    except Exception:
+        pass
+    _ = (out_dir / "Makefile").write_text(
+        "".join(
+            [
+                "all: paper slides\n\n",
+                "paper:\n\t",
+                "pdflatex -interaction=nonstopmode paper.tex\n\t",
+                "bibtex paper || true\n\t",
+                "pdflatex -interaction=nonstopmode paper.tex\n\t",
+                "pdflatex -interaction=nonstopmode paper.tex\n\n",
+                "slides:\n\t",
+                "pdflatex -interaction=nonstopmode slides.tex\n\n",
+                "clean:\n\t",
+                "rm -f *.aux *.bbl *.blg *.log *.out *.toc *.nav *.snm *.vrb *.fls *.fdb_latexmk\n",
+            ]
+        ),
+        encoding="utf-8",
+    )
+def sources() -> list[PresetSource]:
+    return [
+        PresetSource(
+            url="https://github.com/alibaba-damo-academy/RynnBrain",
+            title="alibaba-damo-academy/RynnBrain (GitHub)",
+            kind="primary",
+            priority=1,
+            notes="Code, checkpoints pointers, cookbooks, benchmarks.",
+        ),
+        PresetSource(
+            url="https://alibaba-damo-academy.github.io/RynnBrain.github.io/",
+            title="RynnBrain project page",
+            kind="primary",
+            priority=1,
+            notes="Abstract, model lineup, demos, links.",
+        ),
+        PresetSource(
+            url="https://arxiv.org/abs/2602.14979",
+            title="RynnBrain: Open Embodied Foundation Models (arXiv:2602.14979)",
+            kind="primary",
+            priority=1,
+            notes="Technical report; claims, methodology, evaluations.",
+        ),
+        PresetSource(
+            url="https://huggingface.co/Alibaba-DAMO-Academy/RynnBrain-2B",
+            title="RynnBrain-2B model card (Hugging Face)",
+            kind="primary",
+            priority=2,
+            notes="Weights access, inference notes, license.",
+        ),
+        PresetSource(
+            url="https://www.scmp.com/tech/tech-war/article/3343212/alibaba-unveils-rynnbrain-embodied-ai-model-gives-robots-brain",
+            title="SCMP coverage: Alibaba unveils RynnBrain",
+            kind="secondary",
+            priority=3,
+            notes="Press summary; may include comparisons and quotes.",
+        ),
+        PresetSource(
+            url="https://connectcx.ai/alibabas-rynnbrain-advances-robot-intelligence/",
+            title="CONNECTCX coverage: Alibaba’s RynnBrain Advances Robot Intelligence",
+            kind="secondary",
+            priority=4,
+            notes="Third-party coverage; validate against primary sources.",
+        ),
+        PresetSource(
+            url="https://huggingface.co/papers/2602.14979",
+            title="Hugging Face Papers page for arXiv:2602.14979",
+            kind="secondary",
+            priority=4,
+            notes="Convenient summary + links.",
+        ),
+    ]
+def pre_report_md() -> str:
+    srcs = sources()
+    src_lines = [
+        "\n".join(
+            [
+                f"[{i}] {s.title}",
+                f"    - URL: {s.url}",
+                f"    - Type: {s.kind} | Priority: {s.priority}",
+                f"    - Notes: {s.notes}",
+            ]
+        )
+        for i, s in enumerate(srcs, start=1)
+    ]
+    queries = [
+        "RynnBrain arXiv 2602.14979 benchmark 16 leaderboards details",
+        "RynnBrain 30B-A3B MoE architecture A3B meaning experts routing",
+        "RynnBrain spatiotemporal grounding egocentric cognition definitions",
+        "RynnBrain-Plan manipulation planning dataset tasks evaluation",
+        "RynnBrain-Nav VLN benchmarks used and results",
+        "RynnBrain-CoP chain-of-point spatial reasoning prompt format",
+        "Qwen3-VL base model differences vs RynnBrain modifications",
+        "Embodied foundation model comparison: Gemini Robotics ER 1.5 Cosmos Reason 2",
+        "Licensing: Apache-2.0 weights usage restrictions if any",
+        "Reproducibility: official code inference requirements and compute",
+    ]
+    talk = [
+        "0:00–1:30 目标与背景：什么是 embodied foundation model，RynnBrain 想解决什么问题",
+        "1:30–4:30 一手资料快速过一遍：GitHub / Project Page / arXiv（只提我们要验证的关键点）",
+        "4:30–7:30 研究问题拆解：能力维度（感知/记忆/定位/推理/规划）",
+        "           与任务维度（nav/manipulation）",
+        "7:30–10:30 证据计划：哪些 claim 必须用什么证据验证",
+        "            （leaderboard、消融、数据集、代码可复现性）",
+        "10:30–13:00 风险与不确定性：宣传与论文差异、评测口径、demo bias、实现门槛",
+        "13:00–15:00 输出计划：最终报告结构、资源打包、可复现 checklist",
+    ]
+    return "\n".join(
+        [
+            "# Pre-Research (15min) — RynnBrain",
+            "",
+            "本 Pre-Research 的目标不是给出最终结论，而是建立**可验证的研究路线**：",
+            "明确问题、证据标准、资源与时间安排，确保后续 deep research 不会变成‘看 demo 写总结’。",
+            "",
+            "## 1. 15 分钟口头 Pre-Brief 讲稿大纲（可照读）",
+            "\n".join([f"- {x}" for x in talk]),
+            "",
+            "## 2. 研究对象界定（Working definition）",
+            "- RynnBrain 是 Alibaba DAMO Academy 在 2026 年 2 月左右开源的一套",
+            "  embodied foundation model 家族。",
+            "- 它强调：以第一人称/自我中心（egocentric）视角做理解，具备时空定位/记忆",
+            "  （spatiotemporal grounding / memory），并面向真实任务规划（planning）。",
+            "- 需要通过一手材料确认：模型族谱（2B/8B/30B MoE，以及 Plan/Nav/CoP 等子模型）、",
+            "  评测体系、训练数据与推理方式，以及开源范围（代码/权重/benchmark）。",
+            "",
+            "## 3. 研究问题（Research Questions）",
+            "下面的问题按优先级排序，前 3 个属于‘不解决就不要写结论’：",
+            "",
+            "### RQ1（最高优先级）：RynnBrain 的核心技术增量是什么？",
+            "- 相比 Qwen3-VL 等基础 VLM，它到底加了什么：时空记忆模块？定位/地图表征？",
+            "  多任务 head？还是主要靠数据与训练配方？",
+            "- 需要在 arXiv 技术报告里找到：架构图、训练目标、数据组成、消融实验。",
+            "",
+            "### RQ2：‘SOTA on 16 embodied leaderboards’ 这类 claim 的证据链是否站得住？",
+            "- 需要明确：16 个榜单各自是什么任务/指标/基线；是否同一评测口径；",
+            "  是否存在 cherry-pick。",
+            "- 证据标准：必须来自官方 benchmark 页面/leaderboard 截图/可复现脚本，而不是新闻稿。",
+            "",
+            "### RQ3：开源的可用性如何（工程落地门槛）？",
+            "- 权重是否全量公开？推理依赖（框架版本、显存、是否需要视频输入管线）？",
+            "- 是否提供 cookbooks，覆盖哪些能力：定位、推理、规划、导航、操作。",
+            "",
+            "### RQ4：能力维度拆解：它到底在‘什么能力’上强？",
+            "- Egocentric cognition：是否包含长期场景理解与一致性跟踪？",
+            "- Spatiotemporal grounding：是否输出坐标/轨迹/地图？误差量化如何做？",
+            "- Planning：是语言层规划（plan-as-text），还是能输出可执行动作序列",
+            "  （actions/waypoints）？",
+            "",
+            "### RQ5：与同类系统的可比性（apples-to-apples）",
+            "- 对比对象：Gemini Robotics ER、NVIDIA Cosmos Reason、其它 embodied VLM / EFM。",
+            "- 对比口径：任务集/传感器输入/是否允许工具调用/是否闭源系统。",
+            "",
+            "## 4. Scope / Non-Scope（边界）",
+            "### Scope",
+            "- 以公开资料为边界：论文/项目页/代码/模型卡/公开 benchmark。",
+            "- 产出一个可审计的‘证据 → 结论’矩阵：每个结论都对应来源与验证步骤。",
+            "",
+            "### Non-Scope（本轮明确不做）",
+            "- 不做真实机器人部署复现（除非官方提供可运行 demo 且成本可控）。",
+            "- 不做未公开数据/内部实现猜测；不引用无���访问或不可验证的泄漏信息。",
+            "",
+            "## 5. 证据标准（Evaluation Criteria）",
+            "为了避免‘看起来很强’的主观总结，本研究采用硬标准：",
+            "- 论文证据：架构/训练/消融/实验设置必须可在 arXiv 报告中定位到章节与图表。",
+            "- 代码证据：能在 GitHub 找到对应实现入口（推理脚本、配置、模型定义）。",
+            "- Bench 证据：结果必须能追溯到官方 benchmark/leaderboard 或可复现评测脚本。",
+            "- 口径一致：比较必须满足相同输入与评测规则；否则标注为‘不可直接比较’。",
+            "- 可用性：给出最小可运行路径（依赖、命令、显存、样例输入）。",
+            "",
+            "## 6. 检索与阅读计划（Search Plan & Reading Plan）",
+            "### 6.1 顺序（建议在 2–4 小时深研里执行）",
+            "1) GitHub README + 目录：确定开源范围、模型列表、入口脚本、benchmark 链接。\n"
+            "2) Project Page：收集所有外链（HF/ModelScope/Benchmark/Demo/Video）。\n"
+            "3) arXiv：抓核心章节：method、experiments、ablation、limitations。\n"
+            "4) Model Card：确认权重、许可证、推理限制与样例。\n"
+            "5) Press：只作为线索，不作为证据；对 press 中的 claim 做反向核对。",
+            "",
+            "### 6.2 Query 列表（可直接用于搜索/对照阅读）",
+            "\n".join([f"- {q}" for q in queries]),
+            "",
+            "## 7. 产出设计（Deliverables）",
+            "在完成 deep research 后，最终交付物建议包含：",
+            "- 长文研究报告（含 Resources、证据矩阵、可复现路径、局限与开放问题）",
+            "- 15 分钟演讲稿 + Beamer（信息密度高，但每页只承载一个结论）",
+            "- research.json（结构化审计：来源、摘录、结论、证据链接、验证状态）",
+            "- resources/（把关键页面快照打包，避免链接失效）",
+            "",
+            "## 8. 风险与不确定性（Risks & Unknowns）",
+            "- Press 可能夸大：需以论文与 benchmark 为准。",
+            "- Leaderboard 的口径可能不统一：需逐项核对设置。",
+            "- Demo bias：演示视频不等于泛化能力。",
+            "- 可复现门槛：依赖、算力、输入管线（视频/多帧）可能较重。",
+            "- 许可证与权重条款：代码 Apache-2.0 不等于所有权重都无约束。",
+            "",
+            "## 9. 资源清单（Prioritized Resources）",
+            "\n".join(src_lines),
+            "",
+        ]
+    )
+def generate(out: Path, keep_stage: bool, fetch: bool) -> Path:
+    stage_dir = stage_dir_for_out(out)
+    stage_dir.mkdir(parents=True, exist_ok=True)
+    _write_compile_helpers(stage_dir)
+    srcs = sources()
+    src_json = [asdict(s) for s in srcs]
+    resources_dir = stage_dir / "resources"
+    snapshots_dir = resources_dir / "snapshots"
+    snapshots_dir.mkdir(parents=True, exist_ok=True)
+    _ = (resources_dir / "sources.json").write_text(
+        json.dumps({"sources": src_json}, ensure_ascii=False, indent=2),
+        encoding="utf-8",
+    )
+    snapshots: list[dict[str, object]] = []
+    if fetch:
+        for i, s in enumerate(srcs, start=1):
+            slug = _slugify(s.title)
+            target = snapshots_dir / f"{i:02d}_{slug}.txt"
+            entry: dict[str, object] = {"url": s.url, "title": s.title, "path": str(target)}
+            try:
+                ctype, text = _fetch_snapshot(s.url)
+                entry["content_type"] = ctype
+                _ = target.write_text(text, encoding="utf-8")
+                entry["ok"] = True
+            except Exception as e:
+                entry["ok"] = False
+                entry["error"] = str(e)
+            snapshots.append(entry)
+    pre = pre_report_md()
+    _ = (stage_dir / "pre_report.md").write_text(pre, encoding="utf-8")
+    _ = (stage_dir / "report.md").write_text("# (Not generated in preset mode)\n", encoding="utf-8")
+    _ = (stage_dir / "speech.md").write_text("# (Not generated in preset mode)\n", encoding="utf-8")
+    _ = (stage_dir / "paper.tex").write_text(
+        "\\documentclass[11pt]{article}\n"
+        "\\usepackage[UTF8]{ctex}\n"
+        "\\usepackage{hyperref}\n"
+        "\\title{RynnBrain Pre-Research}\n"
+        "\\author{hydradeck preset}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\maketitle\n"
+        "\\section*{Pre-Research}\n"
+        "This preset package contains a Markdown pre-research report and archived resources.\\\\\n"
+        "See pre_report.md and resources/.\n"
+        "\\end{document}\n",
+        encoding="utf-8",
+    )
+    _ = (stage_dir / "slides.tex").write_text(
+        "\\documentclass{beamer}\n"
+        "\\usepackage[UTF8]{ctex}\n"
+        "\\usetheme{Madrid}\n"
+        "\\title{RynnBrain Pre-Research (15min)}\n"
+        "\\author{hydradeck preset}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\frame{\\titlepage}\n"
+        "\\begin{frame}{What is inside?}\n"
+        "- pre_report.md\\\\\n"
+        "- resources/sources.json\\\\\n"
+        "- resources/snapshots/*\\\\\n"
+        "\\end{frame}\n"
+        "\\end{document}\n",
+        encoding="utf-8",
+    )
+    _ = (stage_dir / "refs.bib").write_text("% (Not generated in preset mode)\n", encoding="utf-8")
+    research = {
+        "topic": "RynnBrain",
+        "mode": "preset-pre",
+        "sources": src_json,
+        "snapshots": snapshots,
+        "meta": {"fetch": fetch},
+    }
+    _ = (stage_dir / "research.json").write_text(
+        json.dumps(research, ensure_ascii=False, indent=2),
+        encoding="utf-8",
+    )
+    finalize_output(out, stage_dir, keep_stage=keep_stage)
+    return out

hydradeck/render.py ADDED Viewed

	@@ -0,0 +1,471 @@

+from __future__ import annotations
+import re
+from dataclasses import dataclass
+from hydradeck.core.types import ExtractedFact, Source
+_LATEX_SPECIALS: dict[str, str] = {
+    "\\": r"\textbackslash{}",
+    "{": r"\{",
+    "}": r"\}",
+    "#": r"\#",
+    "$": r"\$",
+    "%": r"\%",
+    "&": r"\&",
+    "_": r"\_",
+    "^": r"\textasciicircum{}",
+    "~": r"\textasciitilde{}",
+}
+def latex_escape(s: str) -> str:
+    return "".join(_LATEX_SPECIALS.get(ch, ch) for ch in s)
+def _bib_key(i: int) -> str:
+    return f"src{i}"
+def _bib_escape(s: str) -> str:
+    return s.replace("\\", "\\\\").replace("{", "\\{").replace("}", "\\}")
+def render_bibtex(sources: list[Source]) -> str:
+    lines: list[str] = []
+    for i, s in enumerate(sources, start=1):
+        key = _bib_key(i)
+        lines.append(f"@misc{{{key},")
+        lines.append(f"  title = {{{_bib_escape(s.title)}}},")
+        lines.append(f"  howpublished = {{\\url{{{_bib_escape(s.url)}}}}},")
+        lines.append("  note = {Accessed: 2026-03-04},")
+        lines.append("}")
+        lines.append("")
+    return "\n".join(lines).strip() + "\n"
+def _replace_numeric_citations(text: str, max_n: int) -> str:
+    def repl(m: re.Match[str]) -> str:
+        num = int(m.group(1))
+        if 1 <= num <= max_n:
+            return f"\\cite{{{_bib_key(num)}}}"
+        return m.group(0)
+    return re.sub(r"\[(\d{1,3})\]", repl, text)
+def _markdown_to_latex_paragraphs(md: str, max_n: int) -> str:
+    text = md.strip()
+    text = re.sub(r"```[\s\S]*?```", "", text)
+    text = re.sub(r"^\s*[-*+]\s+", "", text, flags=re.MULTILINE)
+    text = re.sub(r"^\s*#+\s*", "", text, flags=re.MULTILINE)
+    text = re.sub(r"`([^`]+)`", r"\1", text)
+    text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
+    text = re.sub(r"\*(.*?)\*", r"\1", text)
+    text = re.sub(r"\[(.*?)\]\((.*?)\)", r"\1", text)
+    text = _replace_numeric_citations(text, max_n=max_n)
+    text = latex_escape(text)
+    text = re.sub(r"\\textbackslash\{\}cite\\\{(src\d+)\\\}", r"\\cite{\1}", text)
+    text = text.replace("\n\n", "\n\\par\n")
+    return text
+def render_paper(
+    topic: str,
+    outline: list[str],
+    body: str,
+    facts: list[ExtractedFact],
+    sources: list[Source],
+) -> str:
+    topic_e = latex_escape(topic)
+    url_to_key = {s.url: _bib_key(i) for i, s in enumerate(sources, start=1)}
+    outline_items = "\n".join([f"\\item {latex_escape(x)}" for x in outline[:10]])
+    fact_sentences: list[str] = []
+    for f in facts[:18]:
+        key = url_to_key.get(f.url)
+        cite = f"\\cite{{{key}}}" if key else ""
+        sentence = latex_escape(f.claim.strip())
+        if sentence and sentence[-1] not in ".!?":
+            sentence += "."
+        fact_sentences.append(sentence + (cite if cite else ""))
+    facts_paragraph = (
+        " ".join(fact_sentences)
+        if fact_sentences
+        else "No extracted facts available."
+    )
+    body_latex = _markdown_to_latex_paragraphs(body, max_n=len(sources))
+    return (
+        "\\documentclass[11pt]{article}\n"
+        "\\usepackage{geometry}\n"
+        "\\usepackage{hyperref}\n"
+        "\\usepackage{url}\n"
+        "\\usepackage{booktabs}\n"
+        "\\usepackage{longtable}\n"
+        "\\geometry{margin=1in}\n"
+        "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
+        f"\\title{{{topic_e}}}\n"
+        "\\author{hydradeck}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\maketitle\n"
+        "\\begin{abstract}\n"
+        "This report presents a structured analysis with explicit traceability to sources.\n"
+        "\\end{abstract}\n\n"
+        "\\section*{1. Introduction and Background}\n"
+        + facts_paragraph
+        + "\n\n"
+        "\\section*{2. Logical Outline}\n"
+        "\\begin{itemize}\n"
+        + outline_items
+        + "\n\\end{itemize}\n\n"
+        "\\section*{3. Evidence and Key Findings}\n"
+        + body_latex
+        + "\n\n"
+        "\\section*{4. Limitations and Discussion}\n"
+        "The analysis is bounded by available public evidence and may evolve as sources update.\n\n"
+        "\\section*{5. Conclusion}\n"
+        "Conclusions are presented in a source-traceable form and should be interpreted with the\n"
+        "reported assumptions and constraints.\n\n"
+        "\\bibliographystyle{plain}\n"
+        "\\bibliography{refs}\n"
+        "\\end{document}\n"
+    )
+def render_report_structured(
+    topic: str,
+    section_blocks: list[dict[str, str]],
+    language: str = "en",
+) -> str:
+    lang = language.lower()
+    topic_e = latex_escape(topic)
+    if lang == "zh":
+        preamble = (
+            "\\documentclass[11pt]{ctexart}\n"
+            "\\usepackage[a4paper,margin=1in]{geometry}\n"
+            "\\usepackage{hyperref}\n"
+            "\\usepackage{url}\n"
+            "\\usepackage{booktabs}\n"
+            "\\usepackage{longtable}\n"
+            "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
+            f"\\title{{{topic_e}}}\n"
+            "\\author{hydradeck}\n"
+            "\\date{\\today}\n"
+            "\\begin{document}\n"
+            "\\maketitle\n"
+        )
+    else:
+        preamble = (
+            "\\documentclass[11pt]{article}\n"
+            "\\usepackage{geometry}\n"
+            "\\usepackage{hyperref}\n"
+            "\\usepackage{url}\n"
+            "\\usepackage{booktabs}\n"
+            "\\usepackage{longtable}\n"
+            "\\geometry{margin=1in}\n"
+            "\\hypersetup{colorlinks=true,linkcolor=black,citecolor=blue,urlcolor=blue}\n"
+            f"\\title{{{topic_e}}}\n"
+            "\\author{hydradeck}\n"
+            "\\date{\\today}\n"
+            "\\begin{document}\n"
+            "\\maketitle\n"
+        )
+    content_parts: list[str] = []
+    for block in section_blocks[:10]:
+        title = latex_escape(str(block.get("name", "Section")).strip() or "Section")
+        latex_body = str(block.get("latex", "")).strip()
+        latex_body = re.sub(r"\\section\*?\{[^}]*\}", "", latex_body)
+        latex_body = re.sub(r"\\subsection\*?\{[^}]*\}", "", latex_body)
+        latex_body = re.sub(r"\\cite\{[^}]*\}", "", latex_body)
+        latex_body = re.sub(r"\[(\d{1,3})\]", "", latex_body)
+        if not latex_body:
+            continue
+        content_parts.append(f"\\section*{{{title}}}\n{latex_body}\n")
+    return preamble + "\n".join(content_parts) + "\n\\end{document}\n"
+@dataclass
+class SlideFrame:
+    title: str
+    bullets: list[str]
+    note: str = ""
+def render_beamer(topic: str, outline: list[str], bullets: list[str]) -> str:
+    section_blocks = [{"name": t, "latex": b} for t, b in zip(outline, bullets)]
+    if not section_blocks:
+        section_blocks = [{"name": "Summary", "latex": "Key findings and implications."}]
+    frames = build_slide_frames_from_sections(section_blocks, language="en")
+    frames = enforce_slide_density(frames, language="en")
+    return render_beamer_frames(topic, frames, language="en")
+def render_beamer_from_report(topic: str, report_tex: str) -> str:
+    frames = build_slide_frames_from_report(report_tex, language="en")
+    frames = enforce_slide_density(frames, language="en")
+    return render_beamer_frames(topic, frames, language="en")
+def _split_paragraph_to_bullets(text: str, language: str) -> list[str]:
+    lang = language.lower()
+    if lang == "zh":
+        parts = [x.strip() for x in re.split(r"[。！？]\s*", text) if x.strip()]
+        out: list[str] = []
+        for p in parts:
+            if len(p) < 6:
+                continue
+            out.append(_trim_chars(_clean_text_for_slide(p), 28))
+        return out
+    parts = [x.strip() for x in re.split(r"[.!?]\s+", text) if x.strip()]
+    out2: list[str] = []
+    for p in parts:
+        clean = _clean_text_for_slide(p)
+        if len(clean) < 14:
+            continue
+        out2.append(_trim_words(clean, 14))
+    return out2
+def build_slide_frames_from_sections(
+    section_blocks: list[dict[str, str]],
+    language: str = "en",
+) -> list[SlideFrame]:
+    lang = language.lower()
+    frames: list[SlideFrame] = []
+    for block in section_blocks[:8]:
+        title = str(block.get("name", "Section")).strip() or ("章节" if lang == "zh" else "Section")
+        body = str(block.get("latex", ""))
+        body = re.sub(r"\\section\*?\{[^}]*\}", "", body)
+        body = re.sub(r"\\subsection\*?\{[^}]*\}", "", body)
+        body = re.sub(r"\\cite\{[^}]*\}", "", body)
+        body = re.sub(r"\[(\d{1,3})\]", "", body)
+        bullets = _split_paragraph_to_bullets(body, lang)
+        if not bullets:
+            continue
+        chunk = 4 if lang == "zh" else 4
+        for i in range(0, len(bullets), chunk):
+            part = bullets[i : i + chunk]
+            if not part:
+                continue
+            if i == 0:
+                frame_title = title
+            else:
+                frame_title = f"{title}（续）" if lang == "zh" else f"{title} (cont.)"
+            frames.append(SlideFrame(title=frame_title, bullets=part))
+    if not frames:
+        raise RuntimeError("insufficient readable section content for slides")
+    return frames
+def enforce_slide_density(
+    frames: list[SlideFrame],
+    language: str = "en",
+    max_bullets_per_frame: int = 4,
+    max_chars_per_bullet_zh: int = 28,
+    max_words_per_bullet_en: int = 14,
+) -> list[SlideFrame]:
+    lang = language.lower()
+    out: list[SlideFrame] = []
+    for fr in frames:
+        normalized: list[str] = []
+        for b in fr.bullets:
+            clean = _clean_text_for_slide(b)
+            if not clean:
+                continue
+            if lang == "zh":
+                clean = _trim_chars(clean, max_chars_per_bullet_zh)
+            else:
+                clean = _trim_words(clean, max_words_per_bullet_en)
+            if clean:
+                normalized.append(clean)
+        if not normalized:
+            continue
+        for i in range(0, len(normalized), max_bullets_per_frame):
+            chunk = normalized[i : i + max_bullets_per_frame]
+            if not chunk:
+                continue
+            if i == 0:
+                title = fr.title
+            else:
+                title = f"{fr.title}（续）" if lang == "zh" else f"{fr.title} (cont.)"
+            out.append(SlideFrame(title=title, bullets=chunk, note=fr.note))
+    if not out:
+        raise RuntimeError("slide density guard removed all frames")
+    return out
+def _trim_words(text: str, max_words: int) -> str:
+    words = text.split()
+    if len(words) <= max_words:
+        return text
+    return " ".join(words[:max_words]).rstrip(" ,.;") + "..."
+def _trim_chars(text: str, max_chars: int) -> str:
+    t = text.strip()
+    if len(t) <= max_chars:
+        return t
+    return t[: max_chars - 1].rstrip("，。,. ") + "…"
+def _clean_text_for_slide(text: str) -> str:
+    t = text.strip()
+    t = re.sub(r"\s+", " ", t)
+    t = re.sub(r"`([^`]+)`", r"\1", t)
+    t = re.sub(r"\*\*(.*?)\*\*", r"\1", t)
+    t = re.sub(r"\*(.*?)\*", r"\1", t)
+    return t
+def build_slide_frames_from_report(report_tex: str, language: str = "en") -> list[SlideFrame]:
+    lang = language.lower()
+    sections = re.split(r"\\section\*\{([^}]+)\}", report_tex)
+    parsed: list[tuple[str, str]] = []
+    if len(sections) >= 3:
+        for i in range(1, len(sections), 2):
+            title = sections[i].strip()
+            body = sections[i + 1] if i + 1 < len(sections) else ""
+            parsed.append((title, body))
+    if not parsed:
+        raise RuntimeError("cannot derive slide frames from report structure")
+    frames: list[SlideFrame] = []
+    for title, body in parsed[:8]:
+        plain = re.sub(r"\\[a-zA-Z]+\*?(\[[^\]]*\])?(\{[^}]*\})?", " ", body)
+        chunks = [x.strip() for x in re.split(r"[。.!?]\s+", plain) if x.strip()]
+        bullets: list[str] = []
+        for c in chunks:
+            clean = _clean_text_for_slide(c)
+            if not clean:
+                continue
+            if lang == "zh":
+                if len(clean) < 8:
+                    continue
+                bullets.append(_trim_chars(clean, 30))
+            else:
+                if len(clean) < 12:
+                    continue
+                bullets.append(_trim_words(clean, 16))
+            if len(bullets) >= 5:
+                break
+        if not bullets:
+            raise RuntimeError(f"insufficient bullet content for slide '{title}'")
+        frames.append(SlideFrame(title=title, bullets=bullets))
+    return frames
+def render_beamer_frames(topic: str, frames: list[SlideFrame], language: str = "en") -> str:
+    lang = language.lower()
+    topic_e = latex_escape(topic)
+    agenda_label = "目录" if lang == "zh" else "Agenda"
+    summary_title = "总结" if lang == "zh" else "Summary"
+    agenda_items = "\n".join([f"\\item {latex_escape(f.title)}" for f in frames[:8]])
+    frame_blocks: list[str] = []
+    for fr in frames[:10]:
+        b = "\n".join([f"\\item {latex_escape(x)}" for x in fr.bullets[:5]])
+        frame_blocks.append(
+            "\\begin{frame}[t]{"
+            + latex_escape(fr.title)
+            + "}\n"
+            + "\\begin{itemize}\n"
+            + b
+            + "\n\\end{itemize}\n"
+            + (f"\\vspace{{0.6em}}\\footnotesize {latex_escape(fr.note)}\n" if fr.note else "")
+            + "\\end{frame}\n"
+        )
+    summary_bullets: list[str] = []
+    for fr in frames[:5]:
+        if fr.bullets:
+            summary_bullets.append(fr.bullets[0])
+    if not summary_bullets:
+        summary_bullets = ["关键要点见前页。" if lang == "zh" else "Key points are summarized in previous slides."]
+    summary_items = "\n".join([f"\\item {latex_escape(x)}" for x in summary_bullets])
+    if lang == "zh":
+        return (
+            "\\documentclass[aspectratio=169]{ctexbeamer}\n"
+            "\\usetheme{Madrid}\n"
+            "\\usefonttheme{professionalfonts}\n"
+            "\\setbeamertemplate{navigation symbols}{}\n"
+            "\\usepackage{hyperref}\n"
+            "\\usepackage{booktabs}\n"
+            "\\definecolor{AccentBlue}{HTML}{1F4E79}\n"
+            "\\setbeamercolor{title}{fg=AccentBlue}\n"
+            "\\setbeamercolor{frametitle}{fg=AccentBlue}\n"
+            "\\setbeamerfont{title}{series=\\bfseries,size=\\Large}\n"
+            "\\setbeamerfont{frametitle}{series=\\bfseries,size=\\large}\n"
+            f"\\title{{{topic_e}}}\n"
+            "\\author{hydradeck}\n"
+            "\\date{\\today}\n"
+            "\\begin{document}\n"
+            "\\frame{\\titlepage}\n"
+            "\\begin{frame}{"
+            + latex_escape(agenda_label)
+            + "}\n"
+            "\\begin{itemize}\n"
+            + agenda_items
+            + "\n\\end{itemize}\n"
+            "\\end{frame}\n"
+            + "".join(frame_blocks)
+            + "\\begin{frame}{"
+            + latex_escape(summary_title)
+            + "}\n"
+            + "\\begin{itemize}\n"
+            + summary_items
+            + "\n\\end{itemize}\n"
+            + "\\end{frame}\n"
+            + "\\end{document}\n"
+        )
+    return (
+        "\\documentclass[aspectratio=169]{beamer}\n"
+        "\\usetheme{metropolis}\n"
+        "\\usefonttheme{professionalfonts}\n"
+        "\\setbeamertemplate{navigation symbols}{}\n"
+        "\\usepackage{hyperref}\n"
+        "\\usepackage{booktabs}\n"
+        "\\definecolor{AccentBlue}{HTML}{1F4E79}\n"
+        "\\setbeamercolor{title}{fg=AccentBlue}\n"
+        "\\setbeamercolor{frametitle}{fg=AccentBlue}\n"
+        "\\setbeamerfont{title}{series=\\bfseries,size=\\Large}\n"
+        "\\setbeamerfont{frametitle}{series=\\bfseries,size=\\large}\n"
+        f"\\title{{{topic_e}}}\n"
+        "\\author{hydradeck}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\frame{\\titlepage}\n"
+        "\\begin{frame}{"
+        + latex_escape(agenda_label)
+        + "}\n"
+        "\\begin{itemize}\n"
+        + agenda_items
+        + "\n\\end{itemize}\n"
+        "\\end{frame}\n"
+        + "".join(frame_blocks)
+        + "\\begin{frame}{"
+        + latex_escape(summary_title)
+        + "}\n"
+        + "\\begin{itemize}\n"
+        + summary_items
+        + "\n\\end{itemize}\n"
+        + "\\end{frame}\n"
+        + "\\end{document}\n"
+    )

hydradeck/resources_pack.py ADDED Viewed

	@@ -0,0 +1,706 @@

+from __future__ import annotations
+import json
+import re
+import time
+import urllib.parse
+from dataclasses import asdict
+from pathlib import Path
+import requests
+from hydradeck.agents.personas import PERSONAS
+from hydradeck.clients import ChatMessage, GrokClient, GrokClientError
+from hydradeck.core.types import RunConfig, Source
+from hydradeck.packaging import finalize_output, stage_dir_for_out
+from hydradeck.utils import Heartbeat, Progress
+def _slugify(s: str) -> str:
+    t = s.strip().lower()
+    t = re.sub(r"[^a-z0-9]+", "-", t)
+    t = re.sub(r"-+", "-", t).strip("-")
+    return t or "source"
+def _extract_sources(obj: dict[str, object], max_sources: int) -> list[Source]:
+    raw = obj.get("sources")
+    out: list[Source] = []
+    if isinstance(raw, list):
+        for item in raw[:max_sources]:
+            if not isinstance(item, dict):
+                continue
+            url_v = item.get("url")
+            title_v = item.get("title")
+            snippet_v = item.get("snippet")
+            if isinstance(url_v, str) and isinstance(title_v, str) and isinstance(snippet_v, str):
+                out.append(Source(url=url_v, title=title_v, snippet=snippet_v))
+    return out
+def build_resources_pack(cfg: RunConfig) -> Path:
+    stage_dir = stage_dir_for_out(cfg.out)
+    stage_dir.mkdir(parents=True, exist_ok=True)
+    t0 = time.time()
+    def remaining_s() -> float:
+        return max(0.0, cfg.max_total_runtime_s - (time.time() - t0))
+    def budget_timeout() -> float:
+        return max(1.0, min(cfg.request_budget_s, remaining_s()))
+    def llm_timeout() -> float:
+        return max(1.0, min(cfg.llm_timeout_s, budget_timeout()))
+    progress = Progress(enabled=cfg.progress, total=6, label="resources")
+    progress.update("start", inc=0)
+    if cfg.use_mock:
+        from hydradeck.clients.grok_client import MockClient
+        client = MockClient()
+    else:
+        client = GrokClient(
+            base_url=cfg.base_url,
+            api_key=cfg.api_key,
+            model=cfg.model,
+            timeout_s=llm_timeout(),
+            heartbeat=cfg.verbose,
+        )
+    query_planner = next(p for p in PERSONAS if p.name == "QueryPlanner")
+    librarian = next(p for p in PERSONAS if p.name == "Librarian")
+    qp_obj = client.chat_json(
+        [
+            ChatMessage(role="system", content=query_planner.system_prompt),
+            ChatMessage(
+                role="user",
+                content=(
+                    "Return JSON: {queries:[...]} with 6 high-recall queries for primary sources. "
+                    "Topic: "
+                    + cfg.topic
+                ),
+            ),
+        ],
+        schema_hint='{ "queries": ["..."] }',
+        temperature=0.2,
+        timeout_s=llm_timeout() if not cfg.use_mock else None,
+    )
+    progress.update("queries")
+    raw_q = qp_obj.get("queries")
+    if isinstance(raw_q, list):
+        queries = [q for q in raw_q if isinstance(q, str) and q.strip()]
+    else:
+        queries = []
+    if not queries:
+        queries = [cfg.topic]
+    seen: set[str] = set()
+    sources: list[Source] = []
+    for q in queries[: min(3, len(queries))]:
+        req = (
+            "Return JSON with key sources: list of {url,title,snippet}. "
+            "Give authoritative sources (prefer official docs, papers, repos). "
+            "Query: "
+            + q
+        )
+        try:
+            src_obj = client.chat_json(
+                [
+                    ChatMessage(role="system", content=librarian.system_prompt),
+                    ChatMessage(role="user", content=req),
+                ],
+                schema_hint='{ "sources": [ {"url":"...","title":"...","snippet":"..."} ] }',
+                temperature=0.2,
+                timeout_s=llm_timeout() if not cfg.use_mock else None,
+            )
+        except GrokClientError:
+            continue
+        for s in _extract_sources(src_obj, cfg.module_sources):
+            if s.url in seen:
+                continue
+            seen.add(s.url)
+            sources.append(s)
+            if len(sources) >= cfg.max_sources:
+                break
+        if len(sources) >= cfg.max_sources:
+            break
+        progress.update("sources")
+    if not sources:
+        sources = [
+            Source(
+                url="https://github.com/alibaba-damo-academy/RynnBrain",
+                title="RynnBrain",
+                snippet="",
+            )
+        ]
+    progress.update("sources")
+    resources_dir = stage_dir / "resources"
+    snaps_dir = resources_dir / "snapshots"
+    snaps_dir.mkdir(parents=True, exist_ok=True)
+    snap_meta: list[dict[str, object]] = []
+    snap_start = time.time()
+    for i, s in enumerate(sources, start=1):
+        if (time.time() - snap_start) > cfg.snapshot_total_timeout_s:
+            break
+        target_base = snaps_dir / f"{i:02d}_{_slugify(s.title)}"
+        entry: dict[str, object] = {"url": s.url, "title": s.title}
+        if cfg.use_mock:
+            entry["ok"] = True
+            target = target_base.with_suffix(".txt")
+            entry["path"] = str(target)
+            target.write_text("mock snapshot", encoding="utf-8")
+            snap_meta.append(entry)
+            continue
+        try:
+            with Heartbeat(enabled=cfg.verbose, label=f"fetch {s.url}", interval_s=5.0):
+                r = requests.get(
+                    s.url,
+                    timeout=min(cfg.snapshot_timeout_s, budget_timeout()),
+                    headers={"User-Agent": "hydradeck/0.1"},
+                )
+                r.raise_for_status()
+            ctype = r.headers.get("content-type", "")
+            entry["content_type"] = ctype
+            is_pdf = "application/pdf" in ctype.lower() or s.url.lower().endswith(".pdf")
+            if is_pdf:
+                data = r.content
+                if len(data) > 5_000_000:
+                    data = data[:5_000_000]
+                target = target_base.with_suffix(".pdf")
+                entry["path"] = str(target)
+                target.write_bytes(data)
+                entry["binary"] = True
+            else:
+                txt = r.text
+                if len(txt) > 200_000:
+                    txt = txt[:200_000]
+                target = target_base.with_suffix(".txt")
+                entry["path"] = str(target)
+                target.write_text(txt, encoding="utf-8")
+            entry["ok"] = True
+        except Exception as e:
+            entry["ok"] = False
+            entry["error"] = str(e)
+        snap_meta.append(entry)
+    progress.update("snapshots")
+    (resources_dir / "sources.json").write_text(
+        json.dumps({"sources": [asdict(s) for s in sources]}, ensure_ascii=False, indent=2),
+        encoding="utf-8",
+    )
+    (resources_dir / "snapshots.json").write_text(
+        json.dumps({"snapshots": snap_meta}, ensure_ascii=False, indent=2),
+        encoding="utf-8",
+    )
+    (stage_dir / "research.json").write_text(
+        json.dumps(
+            {
+                "topic": cfg.topic,
+                "mode": "resources",
+                "sources": [asdict(s) for s in sources],
+                "snapshots": snap_meta,
+            },
+            ensure_ascii=False,
+            indent=2,
+        ),
+        encoding="utf-8",
+    )
+    progress.update("package")
+    try:
+        paper_tex, slides_tex = _generate_pre_tex(cfg, client, sources)
+    except Exception as e:
+        (stage_dir / "pre_tex_error.txt").write_text(str(e) + "\n", encoding="utf-8")
+        paper_tex = _render_paper_tex(cfg.topic, sources)
+        slides_tex = _render_slides_tex(cfg.topic, sources)
+    (stage_dir / "pre_paper.tex").write_text(paper_tex, encoding="utf-8")
+    (stage_dir / "pre_slides.tex").write_text(slides_tex, encoding="utf-8")
+    pdf_dir = stage_dir / "pdf"
+    pdf_dir.mkdir(parents=True, exist_ok=True)
+    urls: list[str] = []
+    errors: list[str] = []
+    if cfg.use_mock:
+        (pdf_dir / "pre_paper.pdf").write_bytes(_dummy_pdf_bytes("paper"))
+        (pdf_dir / "pre_slides.pdf").write_bytes(_dummy_pdf_bytes("slides"))
+    else:
+        try:
+            paper_pdf, paper_meta = _compile_pdf(
+                paper_tex,
+                engine="xelatex",
+                backend=cfg.pdf_compiler,
+            )
+            (pdf_dir / "pre_paper.pdf").write_bytes(paper_pdf)
+            urls.extend(paper_meta.get("urls", []))
+            errors.extend(paper_meta.get("errors", []))
+        except Exception as e:
+            errors.append("paper: " + str(e))
+        try:
+            slides_pdf, slides_meta = _compile_pdf(
+                slides_tex,
+                engine="xelatex",
+                backend=cfg.pdf_compiler,
+            )
+            (pdf_dir / "pre_slides.pdf").write_bytes(slides_pdf)
+            urls.extend(slides_meta.get("urls", []))
+            errors.extend(slides_meta.get("errors", []))
+        except Exception as e:
+            errors.append("slides: " + str(e))
+    if not (pdf_dir / "pre_paper.pdf").exists():
+        errors.append("paper pdf missing")
+    if not (pdf_dir / "pre_slides.pdf").exists():
+        errors.append("slides pdf missing")
+    if urls:
+        (stage_dir / "latexonline_url.txt").write_text("\n".join(urls) + "\n", encoding="utf-8")
+    if errors:
+        (stage_dir / "latexonline_error.txt").write_text("\n".join(errors) + "\n", encoding="utf-8")
+    finalize_output(cfg.out, stage_dir, keep_stage=cfg.keep_stage)
+    progress.done("packaged")
+    return cfg.out
+def _render_paper_tex(topic: str, sources: list[Source]) -> str:
+    def esc(s: str) -> str:
+        return (
+            s.replace("\\", r"\textbackslash{}")
+            .replace("{", r"\{")
+            .replace("}", r"\}")
+            .replace("%", r"\%")
+            .replace("_", r"\_")
+            .replace("&", r"\&")
+            .replace("#", r"\#")
+            .replace("$", r"\$")
+        )
+    items = []
+    for _i, s in enumerate(sources, start=1):
+        items.append(
+            "\\item "
+            + esc(s.title)
+            + "\\\\\n"
+            + "\\small\\url{" + esc(s.url) + "}\\normalsize\\\\\n"
+            + "\\textit{" + esc(s.snippet[:240]) + "}"
+        )
+    body = "\n".join(items) if items else "\\item （暂无来源）"
+    return (
+        "\\documentclass[11pt]{article}\n"
+        "\\usepackage[UTF8]{ctex}\n"
+        "\\usepackage{hyperref}\n"
+        "\\usepackage{url}\n"
+        "\\usepackage{booktabs}\n"
+        "\\title{" + esc(topic) + "——资源预研报告（论文版）}\n"
+        "\\author{hydradeck}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\maketitle\n"
+        "\\section*{来源清单}\n"
+        "\\begin{enumerate}\n"
+        + body
+        + "\n\\end{enumerate}\n"
+        "\\end{document}\n"
+    )
+def _render_slides_tex(topic: str, sources: list[Source]) -> str:
+    def esc(s: str) -> str:
+        return (
+            s.replace("\\", r"\textbackslash{}")
+            .replace("{", r"\{")
+            .replace("}", r"\}")
+            .replace("%", r"\%")
+            .replace("_", r"\_")
+            .replace("&", r"\&")
+            .replace("#", r"\#")
+            .replace("$", r"\$")
+        )
+    bullets: list[str] = []
+    for s in sources[:8]:
+        bullets.append(esc(s.title))
+    items = "\n".join(["\\item " + b for b in bullets]) or "\\item （暂无来源）"
+    return (
+        "\\documentclass{beamer}\n"
+        "\\usepackage[UTF8]{ctex}\n"
+        "\\usetheme{Madrid}\n"
+        "\\title{" + esc(topic) + "——资源预研简报（幻灯片）}\n"
+        "\\author{hydradeck}\n"
+        "\\date{\\today}\n"
+        "\\begin{document}\n"
+        "\\frame{\\titlepage}\n"
+        "\\begin{frame}{关键来源}\n"
+        "\\begin{itemize}\n"
+        + items
+        + "\n\\end{itemize}\n"
+        "\\end{frame}\n"
+        "\\end{document}\n"
+    )
+def _latexonline_compile_url(tex: str, command: str) -> str:
+    q = urllib.parse.quote(tex, safe="")
+    return "https://latexonline.cc/compile?text=" + q + "&command=" + command + "&force=true"
+def _compile_pdf(tex: str, engine: str, backend: str) -> tuple[bytes, dict[str, list[str]]]:
+    meta: dict[str, list[str]] = {"urls": [], "errors": []}
+    b = backend.strip().lower()
+    if b not in {"auto", "latexonline", "texlive"}:
+        b = "auto"
+    if b in {"auto", "latexonline"}:
+        try:
+            meta["urls"].append(_latexonline_compile_url(tex, command=engine))
+            data = _compile_latexonline(tex, command=engine)
+            _ensure_pdf_bytes(data, where="latexonline")
+            return data, meta
+        except Exception as e:
+            meta["errors"].append("latexonline: " + str(e))
+            if b == "latexonline":
+                raise
+    try:
+        data = _compile_texlive_latexcgi(tex, engine=engine)
+        _ensure_pdf_bytes(data, where="texlive")
+        return data, meta
+    except Exception as e:
+        meta["errors"].append("texlive latexcgi: " + str(e))
+        raise
+def _ensure_pdf_bytes(data: bytes, where: str) -> None:
+    if data.startswith(b"%PDF"):
+        return
+    head = data[:200].decode("utf-8", errors="replace")
+    raise RuntimeError(f"{where} did not return PDF. Head: {head}")
+def _compile_latexonline(tex: str, command: str) -> bytes:
+    url = _latexonline_compile_url(tex, command=command)
+    r = requests.get(url, timeout=120.0)
+    if r.status_code >= 400:
+        raise RuntimeError(f"latexonline HTTP {r.status_code}: {r.text[:2000]}")
+    return r.content
+def _compile_texlive_latexcgi(tex: str, engine: str) -> bytes:
+    url = "https://texlive.net/cgi-bin/latexcgi"
+    files = {
+        "filename[]": (None, "document.tex"),
+        "filecontents[]": (None, tex),
+        "engine": (None, engine),
+        "return": (None, "pdf"),
+    }
+    r = requests.post(url, files=files, timeout=120.0)
+    if r.status_code >= 400:
+        raise RuntimeError(f"texlive latexcgi HTTP {r.status_code}: {r.text[:2000]}")
+    return r.content
+def _generate_pre_tex(cfg: RunConfig, client, sources: list[Source]) -> tuple[str, str]:
+    if cfg.use_mock:
+        return _render_paper_tex(cfg.topic, sources), _render_slides_tex(cfg.topic, sources)
+    if cfg.template.strip().lower() in {"pretty", "iclr2026"}:
+        return _generate_pre_tex_pretty(cfg, client, sources)
+    outline = _pre_outline(cfg.topic)
+    src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
+    feedback = ""
+    last_paper = _render_paper_tex(cfg.topic, sources)
+    last_slides = _render_slides_tex(cfg.topic, sources)
+    for _attempt in range(max(1, cfg.pre_tex_attempts)):
+        msgs = [
+            ChatMessage(
+                role="system",
+                content=(
+                    "你是严谨的 LaTeX 作者。"
+                    "必须输出可用 XeLaTeX 编译的高信息密度中文内容。"
+                    "不要输出 JSON。"
+                ),
+            ),
+            ChatMessage(
+                role="user",
+                content=(
+                    "生成两个 LaTeX 文档（全部使用简体中文）：\n"
+                    "(1) paper_tex：article 论文版预研报告，结构严格，信息密度高。\n"
+                    "(2) slides_tex：beamer 16:9，15 分钟汇报（8-10 页）。\n\n"
+                    "共同硬约束：\n"
+                    "- 使用 ctex + xelatex\n"
+                    "- 禁止空话；每节必须有可执行要点/表格\n"
+                    "- 必须包含“参考资源”并列出全部来源 URL\n\n"
+                    "paper 结构（标题可扩展但需覆盖以下要点）：\n"
+                    + "\n".join(["- " + x for x in outline["paper"]])
+                    + "\n\nslides 结构（每项至少一页）：\n"
+                    + "\n".join(["- " + x for x in outline["slides"]])
+                    + "\n\n来源 JSON：\n"
+                    + src_json
+                    + ("\n\n评审反馈：\n" + feedback if feedback else "")
+                    + "\n\n输出格式（必须严格使用）：\n"
+                    + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
+                    + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
+                ),
+            ),
+        ]
+        text = client.chat_text(msgs, temperature=0.2)
+        parsed = _parse_marked_tex(text)
+        paper = parsed.get("paper")
+        slides = parsed.get("slides")
+        if not isinstance(paper, str) or not isinstance(slides, str):
+            feedback = "Output must contain both <<<paper.tex>>> and <<<slides.tex>>> blocks."
+            continue
+        last_paper, last_slides = paper, slides
+        score, fb = _score_pre_tex(paper, slides, sources)
+        if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
+            return paper, slides
+        feedback = fb
+    return last_paper, last_slides
+def _generate_pre_tex_iclr2026(
+    cfg: RunConfig,
+    client,
+    sources: list[Source],
+) -> tuple[str, str]:
+    src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
+    feedback = ""
+    last_paper = ""
+    last_slides = ""
+    for _attempt in range(max(1, cfg.pre_tex_attempts)):
+        msgs = [
+            ChatMessage(
+                role="system",
+                content=(
+                    "你撰写严谨的 ICLR 风格预研文稿。"
+                    "paper 必须使用 \\usepackage{iclr2026_conference,times}。"
+                    "输出必须为简体中文，不要输出 JSON。"
+                ),
+            ),
+            ChatMessage(
+                role="user",
+                content=(
+                    "任务：撰写 (1) paper.tex（ICLR 论文风格）和 (2) slides.tex（beamer）。\n"
+                    "场景：15 分钟预研汇报。\n"
+                    "要求高信息密度：至少 2 张表（证据计划、风险登记）。\n"
+                    "必须包含“参考资源”并列出所有来源 URL。\n\n"
+                    "paper.tex 要求：\n"
+                    "- Use: \\documentclass{article} and \\usepackage{iclr2026_conference,times}\n"
+                    "- 包含：标题、摘要（<=150 词）\n"
+                    "- 章节：目标、待验证主张、研究问题、范围/非范围\n"
+                    "  证据计划（表）、来源映射、风险（表）、时间线（表）\n"
+                    "  交付物、参考资源\n"
+                    "- 禁止空话，每个要点必须可执行。\n\n"
+                    "slides.tex 要求：\n"
+                    "- 16:9 beamer, 8-10 frames, 1 idea per slide\n"
+                    "- 至少 1 页证据矩阵，至少 1 页风险页\n\n"
+                    "来源 JSON：\n"
+                    + src_json
+                    + ("\n\n反馈：\n" + feedback if feedback else "")
+                    + "\n\n输出格式（必须严格）：\n"
+                    + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
+                    + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
+                ),
+            ),
+        ]
+        text = client.chat_text(msgs, temperature=0.2)
+        parsed = _parse_marked_tex(text)
+        paper = parsed.get("paper")
+        slides = parsed.get("slides")
+        if not isinstance(paper, str) or not isinstance(slides, str):
+            feedback = "Missing marked blocks."
+            continue
+        last_paper, last_slides = paper, slides
+        score, fb = _score_pre_tex(paper, slides, sources)
+        if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
+            return paper, slides
+        feedback = fb
+    if last_paper and last_slides:
+        return last_paper, last_slides
+    return _render_paper_tex(cfg.topic, sources), _render_slides_tex(cfg.topic, sources)
+def _generate_pre_tex_pretty(
+    cfg: RunConfig,
+    client,
+    sources: list[Source],
+) -> tuple[str, str]:
+    src_json = json.dumps([asdict(s) for s in sources], ensure_ascii=False)
+    feedback = ""
+    last_paper = _render_paper_tex(cfg.topic, sources)
+    last_slides = _render_slides_tex(cfg.topic, sources)
+    for _attempt in range(max(1, cfg.pre_tex_attempts)):
+        msgs = [
+            ChatMessage(
+                role="system",
+                content=(
+                    "你是严谨的 LaTeX 作者。"
+                    "请输出可直接编译、结构完整、信息密度高的中文 .tex 文件。"
+                    "不要输出 JSON。"
+                ),
+            ),
+            ChatMessage(
+                role="user",
+                content=(
+                    "生成两个自包含 LaTeX 文件（简体中文）：\n"
+                    "A) pre_paper.tex：article。\n"
+                    "B) pre_slides.tex：beamer 16:9。\n\n"
+                    "paper 要求：\n"
+                    "- 使用 xelatex + ctex\n"
+                    "- 版式整洁，信息密度高，无空话\n"
+                    "- 章节至少覆盖：背景、创新、架构、能力、应用、局限、结论、参考资源\n"
+                    "- 每条来源至少引用一次（\\cite{}）\n\n"
+                    "slides 要求：\n"
+                    "- 8-10 页，一页一核心观点\n"
+                    "- 至少 1 页证据矩阵，至少 1 页风险页\n\n"
+                    "来源 JSON（以此为准）：\n"
+                    + src_json
+                    + ("\n\n反馈：\n" + feedback if feedback else "")
+                    + "\n\n输出格式（必须严格）：\n"
+                    + "<<<paper.tex>>>\n<latex>\n<<<end paper.tex>>>\n"
+                    + "<<<slides.tex>>>\n<latex>\n<<<end slides.tex>>>\n"
+                ),
+            ),
+        ]
+        text = client.chat_text(msgs, temperature=0.2)
+        parsed = _parse_marked_tex(text)
+        paper = parsed.get("paper")
+        slides = parsed.get("slides")
+        if not isinstance(paper, str) or not isinstance(slides, str):
+            feedback = "Missing marked blocks."
+            continue
+        last_paper, last_slides = paper, slides
+        score, fb = _score_pre_tex(paper, slides, sources)
+        if "thebibliography" not in paper:
+            score *= 0.75
+        if not cfg.pre_tex_quality_gate or score >= cfg.pre_tex_min_score:
+            return paper, slides
+        feedback = fb
+    return last_paper, last_slides
+def _pre_outline(topic: str) -> dict[str, list[str]]:
+    _ = topic
+    return {
+        "paper": [
+            "标题",
+            "1. 背景与问题定义",
+            "2. 技术创新点",
+            "3. 系统架构与关键机制",
+            "4. 能力与性能分析",
+            "5. 应用场景与价值",
+            "6. 局限与风险",
+            "7. 结论",
+            "8. 参考资源",
+        ],
+        "slides": [
+            "标题",
+            "背景与核心问题",
+            "技术创新点",
+            "系统架构",
+            "能力与性能",
+            "应用场景",
+            "局限与风险",
+            "结论",
+            "Q&A",
+        ],
+    }
+def _score_pre_tex(paper: str, slides: str, sources: list[Source]) -> tuple[float, str]:
+    score = 1.0
+    must = [
+        "背景",
+        "创新",
+        "架构",
+        "应用",
+        "局限",
+        "结论",
+        "参考",
+    ]
+    for k in must:
+        if k not in paper:
+            score *= 0.85
+    if "\\documentclass" not in paper or "\\documentclass" not in slides:
+        score *= 0.5
+    if len(sources) >= 3 and paper.count("\\url{") < 3:
+        score *= 0.7
+    if "iclr2026_conference" in paper and "\\usepackage{iclr2026_conference" not in paper:
+        score *= 0.8
+    zh_chars = sum(1 for ch in (paper + slides) if "\u4e00" <= ch <= "\u9fff")
+    total_chars = max(1, len(paper + slides))
+    if zh_chars / total_chars < 0.15:
+        score *= 0.7
+    fb = "章节不足或资源映射偏弱" if score < 0.95 else "ok"
+    return max(0.0, min(1.0, score)), fb
+def _parse_marked_tex(text: str) -> dict[str, str]:
+    def extract(name: str) -> str | None:
+        start = f"<<<{name}>>>"
+        end = f"<<<end {name}>>>"
+        a = text.find(start)
+        b = text.find(end)
+        if a == -1 or b == -1 or b <= a:
+            return None
+        inner = text[a + len(start) : b].strip()
+        inner = _strip_markdown_fences(inner).strip()
+        if inner.startswith("<latex>"):
+            inner = inner[len("<latex>") :].lstrip()
+        return inner + "\n"
+    out: dict[str, str] = {}
+    paper = extract("paper.tex")
+    slides = extract("slides.tex")
+    if paper is not None:
+        out["paper"] = paper
+    if slides is not None:
+        out["slides"] = slides
+    return out
+def _strip_markdown_fences(s: str) -> str:
+    t = s.strip()
+    if t.startswith("```"):
+        lines = t.splitlines()
+        if len(lines) >= 2 and lines[-1].strip().startswith("```"):
+            inner = "\n".join(lines[1:-1]).strip()
+            return inner + "\n"
+    return s
+def _dummy_pdf_bytes(label: str) -> bytes:
+    content = f"Dummy PDF ({label})".encode("ascii", errors="ignore")
+    return (
+        b"%PDF-1.1\n"
+        b"1 0 obj<<>>endobj\n"
+        b"2 0 obj<< /Length 44 >>stream\n"
+        b"BT /F1 12 Tf 72 720 Td ("
+        + content
+        + b") Tj ET\n"
+        b"endstream endobj\n"
+        b"3 0 obj<< /Type /Page /Parent 4 0 R /Contents 2 0 R >>endobj\n"
+        b"4 0 obj<< /Type /Pages /Kids [3 0 R] /Count 1 >>endobj\n"
+        b"5 0 obj<< /Type /Catalog /Pages 4 0 R >>endobj\n"
+        b"xref\n0 6\n0000000000 65535 f \n"
+        b"trailer<< /Root 5 0 R /Size 6 >>\nstartxref\n0\n%%EOF\n"
+    )

hydradeck/utils.py ADDED Viewed

	@@ -0,0 +1,86 @@

+from __future__ import annotations
+import datetime
+import sys
+import threading
+import time
+def log(enabled: bool, msg: str) -> None:
+    if not enabled:
+        return
+    ts = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds")
+    print(f"[{ts}] {msg}")
+JSON = dict[str, object]
+class Heartbeat:
+    def __init__(self, enabled: bool, label: str, interval_s: float = 5.0) -> None:
+        self._enabled = enabled
+        self._label = label
+        self._interval_s = interval_s
+        self._stop = threading.Event()
+        self._t: threading.Thread | None = None
+    def __enter__(self) -> Heartbeat:
+        if not self._enabled:
+            return self
+        def run() -> None:
+            start = time.time()
+            while not self._stop.wait(self._interval_s):
+                elapsed = int(time.time() - start)
+                sys.stderr.write(f"[heartbeat] {self._label} ({elapsed}s)\n")
+                sys.stderr.flush()
+        self._t = threading.Thread(target=run, daemon=True)
+        self._t.start()
+        return self
+    def __exit__(self, exc_type, exc, tb) -> None:
+        _ = (exc_type, exc, tb)
+        if not self._enabled:
+            return
+        self._stop.set()
+        if self._t is not None:
+            self._t.join(timeout=1.0)
+class Progress:
+    def __init__(
+        self,
+        enabled: bool,
+        total: int,
+        label: str = "",
+        stream=None,
+    ) -> None:
+        self._enabled = enabled
+        self._total = max(int(total), 1)
+        self._label = label
+        self._stream = stream or sys.stderr
+        self._current = 0
+        self._last_len = 0
+    def update(self, step: str, inc: int = 1) -> None:
+        if not self._enabled:
+            return
+        self._current = min(self._total, self._current + max(int(inc), 0))
+        pct = int((self._current / self._total) * 100)
+        bar_len = 24
+        filled = int(bar_len * self._current / self._total)
+        bar = "#" * filled + "-" * (bar_len - filled)
+        msg = f"[progress] {self._label} [{bar}] {pct:3d}%  {step}"
+        pad = " " * max(0, self._last_len - len(msg))
+        self._stream.write("\r" + msg + pad)
+        self._stream.flush()
+        self._last_len = len(msg)
+    def done(self, step: str = "done") -> None:
+        if not self._enabled:
+            return
+        self._current = self._total
+        self.update(step, inc=0)
+        self._stream.write("\n")
+        self._stream.flush()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,44 @@

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "hydradeck"
+version = "0.1.0"
+description = "Grok-driven deep research pipeline that outputs detailed reports, speech scripts, and Beamer slides."
+readme = "README.md"
+requires-python = ">=3.9"
+license = { text = "MIT" }
+authors = [{ name = "hydradeck contributors" }]
+dependencies = [
+  "requests>=2.31.0",
+  "urllib3>=2,<3",
+  "gradio>=4.44.1,<5",
+  "huggingface_hub<1.0",
+]
+[project.optional-dependencies]
+dev = [
+  "pytest>=8.0.0",
+  "ruff>=0.6.0",
+]
+[project.scripts]
+hydradeck = "hydradeck.cli:main"
+[tool.setuptools]
+package-dir = {"" = "."}
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["hydradeck*"]
+[tool.setuptools.package-data]
+hydradeck = ["templates/**/*"]
+[tool.ruff]
+line-length = 100
+target-version = "py39"
+[tool.ruff.lint]
+select = ["E", "F", "I", "UP", "B"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+requests>=2.31.0
+urllib3>=2,<3
+gradio>=4.44.1,<5
+huggingface_hub<1.0

tests/test_app_agentic.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from __future__ import annotations
+from pathlib import Path
+import app
+def test_agentic_pipeline_mock_renders_online_pdfs(monkeypatch) -> None:
+    def fake_compile(tex_source: str, output_name: str) -> str:
+        p = Path("/tmp") / output_name
+        p.write_bytes(b"%PDF-1.5\n%mock\n")
+        return str(p)
+    monkeypatch.setattr(app, "_compile_latex_online", fake_compile)
+    (
+        status,
+        progress_log,
+        _scope_json,
+        section_plan_json,
+        paper_tex,
+        slides_tex,
+        rendered_pdfs,
+        paper_pdf,
+        slides_pdf,
+    ) = (
+        app._run_agentic_pipeline(
+            topic="Agentic flow test",
+            model="grok-3-mini",
+            base_url="https://api.example.com",
+            api_key="",
+            request_budget=20,
+            use_mock=True,
+        )
+    )
+    assert "done" in status.lower()
+    assert "ScopeScout" in progress_log
+    assert "sections" in section_plan_json
+    assert "documentclass" in paper_tex
+    assert "documentclass" in slides_tex
+    paths = [x.strip() for x in rendered_pdfs.splitlines() if x.strip()]
+    assert len(paths) == 2
+    for p in paths:
+        assert Path(p).exists()
+    assert Path(str(paper_pdf)).exists()
+    assert Path(str(slides_pdf)).exists()
+def test_agentic_stream_emits_progress_and_pdf_paths(monkeypatch) -> None:
+    def fake_compile(tex_source: str, output_name: str) -> str:
+        p = Path("/tmp") / output_name
+        p.write_bytes(b"%PDF-1.5\n%mock\n")
+        return str(p)
+    monkeypatch.setattr(app, "_compile_latex_online", fake_compile)
+    chunks = list(
+        app._run_agentic_pipeline_stream(
+            topic="Agentic stream test",
+            model="grok-3-mini",
+            base_url="https://api.example.com",
+            api_key="",
+            request_budget=20,
+            use_mock=True,
+        )
+    )
+    assert len(chunks) >= 3
+    assert chunks[0][-1] == 5
+    assert chunks[1][-1] == 30
+    assert chunks[-1][-1] == 100
+    assert "done" in str(chunks[-1][0]).lower()
+    assert Path(str(chunks[-1][7])).exists()
+    assert Path(str(chunks[-1][8])).exists()

tests/test_cli.py ADDED Viewed

	@@ -0,0 +1,66 @@

+from __future__ import annotations
+import pytest
+from hydradeck import cli
+from hydradeck.core.types import RunConfig
+def test_run_command_accepts_snapshot_total_timeout_default(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    captured: dict[str, float] = {}
+    def fake_run(cfg: RunConfig) -> object:
+        captured["snapshot_total_timeout_s"] = cfg.snapshot_total_timeout_s
+        return object()
+    monkeypatch.setattr(cli, "run", fake_run)
+    code = cli.main(
+        [
+            "run",
+            "--topic",
+            "t",
+            "--out",
+            "out.zip",
+            "--base-url",
+            "https://example.invalid",
+            "--model",
+            "mock",
+            "--mock",
+        ]
+    )
+    assert code == 0
+    assert captured["snapshot_total_timeout_s"] == 60.0
+def test_run_command_passes_request_budget(monkeypatch: pytest.MonkeyPatch) -> None:
+    captured: dict[str, float] = {}
+    def fake_run(cfg: RunConfig) -> object:
+        captured["request_budget_s"] = cfg.request_budget_s
+        return object()
+    monkeypatch.setattr(cli, "run", fake_run)
+    code = cli.main(
+        [
+            "run",
+            "--topic",
+            "t",
+            "--out",
+            "out.zip",
+            "--base-url",
+            "https://example.invalid",
+            "--model",
+            "mock",
+            "--request-budget",
+            "90",
+            "--mock",
+        ]
+    )
+    assert code == 0
+    assert captured["request_budget_s"] == 90.0

tests/test_config.py ADDED Viewed

	@@ -0,0 +1,44 @@

+from __future__ import annotations
+from pathlib import Path
+from hydradeck.config import UserConfig, load_config, load_merged_config, save_config
+def test_save_and_load_config(tmp_path: Path) -> None:
+    p = tmp_path / "cfg.json"
+    save_config(
+        UserConfig(
+            base_url="https://x",
+            api_key="k",
+            model="m",
+            pdf_compiler="auto",
+            template="iclr2026",
+        ),
+        path=p,
+    )
+    cfg = load_config(path=p)
+    assert cfg.base_url == "https://x"
+    assert cfg.api_key == "k"
+    assert cfg.model == "m"
+    assert cfg.pdf_compiler == "auto"
+    assert cfg.template == "iclr2026"
+def test_project_config_overrides_user(tmp_path: Path, monkeypatch) -> None:
+    user_p = tmp_path / "user.json"
+    save_config(UserConfig(base_url="https://u", api_key="u", model="u"), path=user_p)
+    proj_root = tmp_path / "proj"
+    (proj_root / ".hydradeck").mkdir(parents=True)
+    proj_p = proj_root / ".hydradeck" / "config.json"
+    save_config(UserConfig(model="p"), path=proj_p)
+    monkeypatch.chdir(proj_root)
+    from hydradeck import config as cfgmod
+    monkeypatch.setattr(cfgmod, "config_path", lambda: user_p)
+    merged = load_merged_config()
+    assert merged.base_url == "https://u"
+    assert merged.api_key == "u"
+    assert merged.model == "p"

tests/test_preset_pre.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from __future__ import annotations
+import zipfile
+from pathlib import Path
+def test_preset_rynnbrain_zip(tmp_path: Path) -> None:
+    from hydradeck.presets.rynnbrain import generate
+    out_zip = tmp_path / "rynnbrain_pre.zip"
+    generate(out=out_zip, keep_stage=False, fetch=False)
+    assert out_zip.exists()
+    with zipfile.ZipFile(out_zip, "r") as z:
+        names = set(z.namelist())
+    assert "pre_report.md" in names
+    assert "research.json" in names
+    assert "resources/sources.json" in names

tests/test_render.py ADDED Viewed

	@@ -0,0 +1,189 @@

+from __future__ import annotations
+from hydradeck.core.types import ExtractedFact, Source
+from hydradeck.render import (
+    build_slide_frames_from_sections,
+    build_slide_frames_from_report,
+    enforce_slide_density,
+    render_beamer_frames,
+    render_paper,
+    render_report_structured,
+)
+def test_render_paper_converts_markdown_like_body() -> None:
+    sources = [Source(url="https://example.com", title="Example", snippet="snippet")]
+    facts = [
+        ExtractedFact(
+            claim="Claim A",
+            evidence="Evidence A",
+            url="https://example.com",
+            title="Example",
+        )
+    ]
+    body = """## Heading
+- bullet 1
+- bullet 2
+`inline`
+```python
+print('x')
+```
+"""
+    tex = render_paper(
+        topic="demo",
+        outline=["背景", "创新"],
+        body=body,
+        facts=facts,
+        sources=sources,
+    )
+    assert "```" not in tex
+    assert "## Heading" not in tex
+    assert "Heading" in tex
+    assert "\\begin{itemize}" in tex
+    assert "bullet 1" in tex
+    assert "\\section*{1. Introduction and Background}" in tex
+def test_render_templates_use_facts_not_generic_filler() -> None:
+    sources = [Source(url="https://example.com", title="Example", snippet="snippet")]
+    facts = [
+        ExtractedFact(
+            claim="RynnBrain released checkpoints on 2026-02-09",
+            evidence="project timeline from official repo",
+            url="https://example.com",
+            title="Example",
+        ),
+        ExtractedFact(
+            claim="Model introduces interleaved reasoning with spatial grounding",
+            evidence="technical report description",
+            url="https://example.com",
+            title="Example",
+        ),
+    ]
+    paper = render_paper(
+        topic="demo",
+        outline=["背景", "创新", "架构"],
+        body="结论段落 [1]",
+        facts=facts,
+        sources=sources,
+    )
+    section_blocks = [
+        {"name": "背景", "latex": facts[0].claim},
+        {"name": "创新", "latex": facts[1].claim},
+    ]
+    frames = build_slide_frames_from_sections(section_blocks, language="en")
+    slides = render_beamer_frames("demo", frames, language="en")
+    assert "released checkpoints" in paper
+    assert "interleaved reasoning" in paper
+    assert "released checkpoints" in slides
+    assert "interleaved reasoning" in slides
+    section_one = paper.split("\\section*{1. Introduction and Background}", 1)[1]
+    section_one = section_one.split("\\section*{2. Logical Outline}", 1)[0]
+    assert "\\begin{itemize}" not in section_one
+def test_render_beamer_from_report_derives_outline() -> None:
+    paper = (
+        "\\documentclass{article}\n"
+        "\\begin{document}\n"
+        "\\section*{Executive Summary}\nAlpha beta gamma.\n\n"
+        "\\section*{Methodology}\nMethod details.\n\n"
+        "\\section*{Results}\nResult details.\n"
+        "\\end{document}\n"
+    )
+    frames = build_slide_frames_from_report(paper, language="en")
+    slides = render_beamer_frames("demo", frames, language="en")
+    assert "\\begin{frame}{Agenda}" in slides
+    assert "Executive Summary" in slides
+    assert "Methodology" in slides
+def test_render_beamer_frames_limits_density() -> None:
+    report = (
+        "\\documentclass{article}\\begin{document}"
+        "\\section*{Overview} A long sentence about architecture and implementation details repeated."
+        " Another long sentence about evaluation metrics and reproducibility details."
+        "\\section*{Results} Multiple findings with evidence and quantitative metrics."
+        "\\end{document}"
+    )
+    frames = build_slide_frames_from_report(report, language="en")
+    assert len(frames) >= 2
+    tex = render_beamer_frames("demo", frames, language="en")
+    assert "\\begin{frame}{Agenda}" in tex
+    assert "\\begin{itemize}" in tex
+def test_render_report_structured_zh_uses_ctex() -> None:
+    section_blocks = [
+        {"name": "方法", "latex": "本节给出方法细节与参数说明。"},
+        {"name": "结果", "latex": "本节给出结果与证据。"},
+    ]
+    tex = render_report_structured("中文研究报告", section_blocks, language="zh")
+    assert "\\documentclass[11pt]{ctexart}" in tex
+    assert "本研究报告聚焦可追溯证据" not in tex
+    assert "建议定期刷新证据并进行复跑验证" not in tex
+    assert "\\section*{方法}" in tex
+def test_render_beamer_frames_zh_uses_ctexbeamer() -> None:
+    frames = [
+        build_slide_frames_from_report(
+            "\\documentclass{ctexart}\\begin{document}\\section*{结果}关键结果一。关键结果二。\\end{document}",
+            language="zh",
+        )[0]
+    ]
+    tex = render_beamer_frames("中文主题", frames, language="zh")
+    assert "\\documentclass[aspectratio=169]{ctexbeamer}" in tex
+    assert "\\begin{frame}{目录}" in tex
+def test_build_slide_frames_from_sections_splits_long_section() -> None:
+    section_blocks = [
+        {
+            "name": "Results",
+            "latex": (
+                "The first finding shows strong improvement in consistency and precision. "
+                "The second finding shows stronger robustness under distribution shift. "
+                "The third finding indicates cost-performance improvement. "
+                "The fourth finding confirms stability across runs. "
+                "The fifth finding highlights limitations and guardrails."
+            ),
+        }
+    ]
+    frames = build_slide_frames_from_sections(section_blocks, language="en")
+    assert len(frames) >= 2
+def test_render_report_structured_removes_bracket_refs() -> None:
+    section_blocks = [
+        {"name": "Evidence", "latex": "Claim [1] with support [2] and \\cite{src1}."}
+    ]
+    tex = render_report_structured("demo", section_blocks, language="en")
+    assert "[1]" not in tex
+    assert "[2]" not in tex
+    assert "\\cite{" not in tex
+def test_enforce_slide_density_splits_large_bullet_groups() -> None:
+    frames = [
+        {
+            "title": "Results",
+            "bullets": [
+                "point one with enough words to be valid",
+                "point two with enough words to be valid",
+                "point three with enough words to be valid",
+                "point four with enough words to be valid",
+                "point five with enough words to be valid",
+            ],
+        }
+    ]
+    from hydradeck.render import SlideFrame
+    fr = [SlideFrame(title=x["title"], bullets=x["bullets"]) for x in frames]
+    out = enforce_slide_density(fr, language="en", max_bullets_per_frame=4)
+    assert len(out) == 2
+    assert out[0].title == "Results"
+    assert "(cont.)" in out[1].title

tests/test_resources_pack_mock.py ADDED Viewed

	@@ -0,0 +1,43 @@

+from __future__ import annotations
+import zipfile
+from pathlib import Path
+from hydradeck.core.types import RunConfig
+from hydradeck.resources_pack import build_resources_pack
+def test_resources_pack_mock(tmp_path: Path) -> None:
+    out_zip = tmp_path / "res.zip"
+    cfg = RunConfig(
+        topic="RynnBrain",
+        out=out_zip,
+        base_url="https://example.invalid",
+        api_key="",
+        model="mock",
+        use_mock=True,
+        verbose=False,
+        progress=False,
+        llm_timeout_s=5.0,
+        max_total_runtime_s=5.0,
+        request_budget_s=2.0,
+        snapshot_timeout_s=1.0,
+        keep_stage=False,
+        max_sources=3,
+        module_sources=2,
+    )
+    build_resources_pack(cfg)
+    assert out_zip.exists()
+    with zipfile.ZipFile(out_zip, "r") as z:
+        names = set(z.namelist())
+        pre_paper = z.read("pre_paper.tex").decode("utf-8")
+        pre_slides = z.read("pre_slides.tex").decode("utf-8")
+    assert "resources/sources.json" in names
+    assert "resources/snapshots.json" in names
+    assert "research.json" in names
+    assert "pre_paper.tex" in names
+    assert "pre_slides.tex" in names
+    assert "pdf/pre_paper.pdf" in names
+    assert "pdf/pre_slides.pdf" in names
+    assert "来源清单" in pre_paper
+    assert "关键来源" in pre_slides

tests/test_smoke_mock.py ADDED Viewed

	@@ -0,0 +1,57 @@

+from __future__ import annotations
+import zipfile
+from pathlib import Path
+from hydradeck.core.types import RunConfig
+from hydradeck.pipeline import run
+def test_mock_run_creates_zip(tmp_path: Path) -> None:
+    out_zip = tmp_path / "demo.zip"
+    cfg = RunConfig(
+        topic="test topic",
+        out=out_zip,
+        base_url="https://example.invalid",
+        api_key="",
+        model="mock",
+        use_mock=True,
+        verbose=False,
+        iterations=2,
+        max_sources=3,
+        archive_snapshots=False,
+        auto=True,
+        auto_queries=True,
+        auto_models=True,
+    )
+    run(cfg)
+    assert out_zip.exists()
+    with zipfile.ZipFile(out_zip, "r") as z:
+        names = set(z.namelist())
+        compile_sh = z.read("compile.sh").decode("utf-8")
+        paper_tex = z.read("paper.tex").decode("utf-8")
+        slides_tex = z.read("slides.tex").decode("utf-8")
+    for required in [
+        "pre_report.md",
+        "report.md",
+        "speech.md",
+        "paper.tex",
+        "slides.tex",
+        "refs.bib",
+        "research.json",
+        "compile.sh",
+        "Makefile",
+        "resources/sources.json",
+    ]:
+        assert required in names
+    assert "xelatex -interaction=nonstopmode paper.tex" in compile_sh
+    assert "xelatex -interaction=nonstopmode slides.tex" in compile_sh
+    assert "\\section*{3. Evidence and Key Findings}" in paper_tex
+    assert "\\section*{1. Introduction and Background}" in paper_tex
+    assert "\\begin{frame}{Agenda}" in slides_tex
+    assert "\\usetheme{metropolis}" in slides_tex
+    assert "```" not in paper_tex
+    assert "```" not in slides_tex
+    assert "## " not in paper_tex
+    assert "## " not in slides_tex

tests/test_verbatim_mock.py ADDED Viewed

	@@ -0,0 +1,43 @@

+from __future__ import annotations
+import zipfile
+from pathlib import Path
+from hydradeck.core.types import RunConfig
+from hydradeck.pipeline import run
+def test_mock_run_verbatim_creates_outputs(tmp_path: Path) -> None:
+    out_zip = tmp_path / "verbatim.zip"
+    cfg = RunConfig(
+        topic="RynnBrain",
+        out=out_zip,
+        base_url="https://example.invalid",
+        api_key="",
+        model="mock",
+        use_mock=True,
+        verbose=False,
+        iterations=1,
+        max_sources=3,
+        verbatim=True,
+        archive_prompts=True,
+        archive_snapshots=False,
+        quality_gate=True,
+        min_quality_score=0.85,
+        max_quality_attempts=2,
+    )
+    run(cfg)
+    with zipfile.ZipFile(out_zip, "r") as z:
+        names = set(z.namelist())
+    for required in [
+        "pre_report.md",
+        "report.md",
+        "speech.md",
+        "paper.tex",
+        "slides.tex",
+        "refs.bib",
+        "research.json",
+        "resources/sources.json",
+        "prompts.jsonl",
+    ]:
+        assert required in names