Spaces:

somratpro
/

huggingFlow

Running

somratpro Claude Sonnet 4.6 commited on 24 days ago

Commit

5d8d23e

0 Parent(s):

init: HuggingDeer — DeerFlow on Hugging Face Spaces

Single-container Docker deployment of DeerFlow (frontend + backend + nginx).
Clones deer-flow source at build time, builds Next.js and Python backend,
runs all three services inside one container on port 7860.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (5) hide show

Dockerfile +123 -0
README.md +75 -0
deer-sync.py +183 -0
nginx.conf +140 -0
start.sh +417 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,123 @@

+# ════════════════════════════════════════════════════════════════
+# HuggingDeer — DeerFlow Research Agent for Hugging Face Spaces
+# ════════════════════════════════════════════════════════════════
+#
+# Single-container deployment of DeerFlow (frontend + backend + nginx)
+# on port 7860 as required by HF Spaces Docker runtime.
+#
+# Build args:
+#   DEER_FLOW_REF  — git ref to clone (branch/tag/sha, default: main)
+#   UV_IMAGE       — uv tool image (default: ghcr.io/astral-sh/uv:0.7.20)
+#   NODE_MAJOR     — Node.js major version (default: 22)
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.7.20
+ARG DEER_FLOW_REF=main
+# ── uv source ────────────────────────────────────────────────────
+FROM ${UV_IMAGE} AS uv-source
+# ── Stage 1: Clone DeerFlow source ───────────────────────────────
+FROM alpine/git:latest AS source
+ARG DEER_FLOW_REF
+RUN git clone --depth=1 \
+    https://github.com/bytedance/deer-flow.git /src && \
+    cd /src && \
+    git log --oneline -1
+# ── Stage 2: Build Next.js frontend ──────────────────────────────
+FROM node:22-alpine AS frontend-builder
+RUN corepack enable && corepack install -g pnpm@10.26.2
+WORKDIR /app
+COPY --from=source /src/frontend ./frontend
+# pnpm virtual store uses hard links — COPY in later stages works correctly
+RUN cd frontend && pnpm install --frozen-lockfile
+# SKIP_ENV_VALIDATION=1 bypasses t3-oss env checks (no secrets at build time)
+RUN cd frontend && SKIP_ENV_VALIDATION=1 pnpm build
+# ── Stage 3: Install Python backend dependencies ──────────────────
+FROM python:3.12-slim-bookworm AS backend-builder
+COPY --from=uv-source /uv /uvx /usr/local/bin/
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential ca-certificates curl git \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY --from=source /src/backend ./backend
+# uv sync installs into backend/.venv (isolated from system python)
+RUN cd backend && uv sync
+# ── Stage 4: Runtime ─────────────────────────────────────────────
+FROM python:3.12-slim-bookworm
+ENV LANG=C.UTF-8 \
+    LC_ALL=C.UTF-8 \
+    PYTHONIOENCODING=utf-8 \
+    PYTHONUNBUFFERED=1
+ARG NODE_MAJOR=22
+# Install: Node.js (for Next.js runtime), nginx (reverse proxy), runtime tools
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl ca-certificates gnupg nginx jq \
+    && mkdir -p /etc/apt/keyrings \
+    && curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
+       | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
+    && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] \
+       https://deb.nodesource.com/node_${NODE_MAJOR}.x nodistro main" \
+       > /etc/apt/sources.list.d/nodesource.list \
+    && apt-get update && apt-get install -y --no-install-recommends nodejs \
+    && pip3 install --no-cache-dir --break-system-packages huggingface_hub pyyaml \
+    && rm -rf /var/lib/apt/lists/*
+# pnpm for `pnpm start` in Next.js runtime
+RUN corepack enable && corepack install -g pnpm@10.26.2
+# uv for backend startup
+COPY --from=uv-source /uv /uvx /usr/local/bin/
+# ── Create non-root user UID=1000 (required by HF Spaces) ────────
+RUN useradd -m -u 1000 -s /bin/bash user && \
+    mkdir -p \
+        /app/backend \
+        /app/frontend \
+        /app/skills \
+        /app/data \
+        /tmp/nginx-tmp && \
+    chown -R 1000:1000 /app /tmp/nginx-tmp && \
+    # nginx non-root: redirect all temp/pid/log paths to writable dirs
+    chown -R 1000:1000 /var/log/nginx /var/lib/nginx 2>/dev/null || true
+# ── Copy built artifacts ──────────────────────────────────────────
+# Backend: Python source + pre-built .venv from uv sync
+COPY --from=backend-builder --chown=1000:1000 /app/backend /app/backend
+# Skills directory (read-only agent skills)
+COPY --from=source --chown=1000:1000 /src/skills /app/skills
+# Config template (used to generate config.yaml at startup)
+COPY --from=source --chown=1000:1000 /src/config.example.yaml /app/config.example.yaml
+# Frontend: built .next + node_modules (pnpm hard links — self-contained after COPY)
+COPY --from=frontend-builder --chown=1000:1000 /app/frontend /app/frontend
+# ── Copy HuggingDeer runtime scripts ─────────────────────────────
+COPY --chown=1000:1000 nginx.conf /etc/nginx/nginx.conf
+COPY --chown=1000:1000 start.sh   /app/start.sh
+COPY --chown=1000:1000 deer-sync.py /app/deer-sync.py
+RUN chmod +x /app/start.sh /app/deer-sync.py
+USER user
+WORKDIR /app
+EXPOSE 7860
+# 120s start period: frontend build + backend uv sync + DB init takes ~60-90s on cold start
+HEALTHCHECK --interval=30s --timeout=10s --start-period=120s \
+    CMD curl -fsS http://localhost:7860/health || exit 1
+CMD ["/app/start.sh"]

README.md ADDED Viewed

	@@ -0,0 +1,75 @@

+# 🦌 HuggingDeer
+**DeerFlow** research agent running as a self-hosted [Hugging Face Space](https://huggingface.co/spaces) (Docker).
+Single-container deployment — frontend (Next.js) + backend (FastAPI) + nginx all in one image. No Docker-in-Docker, no Kubernetes.
+## Required Secrets
+Set these in **Settings → Variables and Secrets** on your HF Space:
+| Secret | Required | Description |
+|--------|----------|-------------|
+| `LLM_MODEL` | ✅ | Model in `provider/name` format (see below) |
+| `LLM_API_KEY` | ✅ | API key for the chosen provider |
+| `SERPER_API_KEY` | recommended | Google Search via Serper (better than DuckDuckGo) |
+| `TAVILY_API_KEY` | optional | Alternative web search |
+| `JINA_API_KEY` | optional | Better web page fetching |
+| `AUTH_JWT_SECRET` | optional | JWT signing secret — auto-generated if not set (sessions reset on restart) |
+| `HF_TOKEN` | optional | Your HF token — enables dataset backup/restore of threads |
+| `BACKUP_DATASET_NAME` | optional | HF dataset repo for backup (default: `huggingdeer-backup`) |
+## LLM_MODEL format
+```
+provider/model-name
+```
+Examples:
+```
+openai/gpt-4o
+openai/gpt-4o-mini
+anthropic/claude-sonnet-4-5
+anthropic/claude-opus-4-5
+google/gemini-2.5-flash
+deepseek/deepseek-chat
+deepseek/deepseek-reasoner
+openrouter/anthropic/claude-3-5-sonnet
+mistral/mistral-large-latest
+groq/llama-3.3-70b-versatile
+```
+## Deploy to HF Spaces
+1. Duplicate this repo to your HF account as a **Docker Space**
+2. Add required secrets
+3. Space builds and starts (~5-10 min on first build)
+## Optional env vars
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CUSTOM_BASE_URL` | — | OpenAI-compatible API base URL (for custom providers) |
+| `SYNC_INTERVAL` | `600` | Seconds between HF Dataset backups |
+| `BACKEND_READY_TIMEOUT` | `120` | Seconds to wait for backend startup |
+| `FRONTEND_READY_TIMEOUT` | `120` | Seconds to wait for frontend startup |
+| `SPACE_HOST` | auto | Set by HF Spaces automatically |
+## What runs inside
+| Process | Port | Role |
+|---------|------|------|
+| nginx | 7860 | Public reverse proxy (routes `/api/*` → backend, `/*` → frontend) |
+| uvicorn (FastAPI) | 8001 | DeerFlow gateway — agents, threads, auth |
+| Next.js | 3000 | DeerFlow UI |
+## Caveats
+- **No Docker sandbox**: DeerFlow's `bash` / code execution tool is disabled by default (`allow_host_bash: false`). File read/write and web search work fine.
+- **Ephemeral storage**: container resets on restart. Enable `HF_TOKEN` + `BACKUP_DATASET_NAME` to persist threads.
+- **Single worker**: backend runs 2 uvicorn workers. For heavy use, consider a dedicated server.
+## Source
+DeerFlow: https://github.com/bytedance/deer-flow

deer-sync.py ADDED Viewed

	@@ -0,0 +1,183 @@

+#!/usr/bin/env python3
+"""
+HuggingDeer state sync — backup/restore DeerFlow runtime data to/from HF Dataset.
+Syncs:
+  - deerflow.db      (SQLite thread/session database)
+  - config.yaml      (generated config, may contain user edits)
+  - workspace/       (agent-created files in the sandbox workspace)
+Usage:
+  deer-sync.py restore    — restore from HF Dataset on startup
+  deer-sync.py sync-once  — push current state to HF Dataset
+  deer-sync.py loop       — sync-once on an interval (reads SYNC_INTERVAL env)
+"""
+import os
+import sys
+import time
+import shutil
+import tarfile
+import tempfile
+import logging
+from pathlib import Path
+logging.basicConfig(level=logging.INFO, format="%(message)s")
+log = logging.getLogger(__name__)
+HF_TOKEN          = os.environ.get("HF_TOKEN", "")
+BACKUP_REPO       = os.environ.get("BACKUP_DATASET_NAME", "huggingdeer-backup")
+HF_USERNAME       = os.environ.get("HF_USERNAME", "")
+DATA_DIR          = Path(os.environ.get("DEER_FLOW_HOME", "/app/data"))
+CONFIG_PATH       = Path(os.environ.get("DEER_FLOW_CONFIG_PATH", DATA_DIR / "config.yaml"))
+SYNC_INTERVAL     = int(os.environ.get("SYNC_INTERVAL", "600"))
+ARCHIVE_NAME = "deerflow-state.tar.gz"
+# Files/dirs to include in the backup archive
+BACKUP_TARGETS = [
+    DATA_DIR / "deerflow.db",
+    DATA_DIR / "workspace",
+    CONFIG_PATH,
+]
+def _get_api():
+    """Return authenticated HfApi or raise."""
+    if not HF_TOKEN:
+        raise RuntimeError("HF_TOKEN not set")
+    from huggingface_hub import HfApi
+    return HfApi(token=HF_TOKEN)
+def _resolve_repo_id(api) -> str:
+    """Resolve BACKUP_REPO to a full repo_id (username/repo-name)."""
+    if "/" in BACKUP_REPO:
+        return BACKUP_REPO
+    if HF_USERNAME:
+        return f"{HF_USERNAME}/{BACKUP_REPO}"
+    # Auto-detect from token
+    user = api.whoami()
+    return f"{user['name']}/{BACKUP_REPO}"
+def _ensure_repo(api, repo_id: str):
+    """Create the dataset repo if it doesn't exist."""
+    from huggingface_hub import create_repo
+    try:
+        create_repo(
+            repo_id=repo_id,
+            repo_type="dataset",
+            private=True,
+            token=HF_TOKEN,
+            exist_ok=True,
+        )
+    except Exception as exc:
+        log.warning("Could not ensure dataset repo: %s", exc)
+def _make_archive(dest: Path):
+    """Pack BACKUP_TARGETS into a .tar.gz archive."""
+    with tarfile.open(dest, "w:gz") as tar:
+        for target in BACKUP_TARGETS:
+            if target.exists():
+                arcname = target.relative_to(DATA_DIR.parent)
+                tar.add(target, arcname=str(arcname))
+                log.debug("  + %s", arcname)
+def _extract_archive(src: Path):
+    """Unpack archive into DATA_DIR.parent (restores original paths)."""
+    extract_root = DATA_DIR.parent
+    with tarfile.open(src, "r:gz") as tar:
+        for member in tar.getmembers():
+            tar.extract(member, path=extract_root)
+    log.info("Extracted state to %s", extract_root)
+def restore():
+    """Download and unpack the latest state archive from HF Dataset."""
+    if not HF_TOKEN:
+        log.info("No HF_TOKEN — skipping restore.")
+        return
+    try:
+        api = _get_api()
+        repo_id = _resolve_repo_id(api)
+        _ensure_repo(api, repo_id)
+        from huggingface_hub import hf_hub_download
+        with tempfile.TemporaryDirectory() as tmp:
+            try:
+                local = hf_hub_download(
+                    repo_id=repo_id,
+                    filename=ARCHIVE_NAME,
+                    repo_type="dataset",
+                    token=HF_TOKEN,
+                    local_dir=tmp,
+                )
+                _extract_archive(Path(local))
+                log.info("State restored from %s", repo_id)
+            except Exception as exc:
+                if "404" in str(exc) or "not found" in str(exc).lower() or "does not exist" in str(exc).lower():
+                    log.info("No existing backup found in %s — starting fresh.", repo_id)
+                else:
+                    raise
+    except Exception as exc:
+        log.warning("Restore failed: %s", exc)
+        raise
+def sync_once():
+    """Pack current state and upload to HF Dataset."""
+    if not HF_TOKEN:
+        return
+    try:
+        api = _get_api()
+        repo_id = _resolve_repo_id(api)
+        _ensure_repo(api, repo_id)
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / ARCHIVE_NAME
+            _make_archive(archive)
+            if not archive.exists() or archive.stat().st_size == 0:
+                log.info("Nothing to backup — skipping upload.")
+                return
+            api.upload_file(
+                path_or_fileobj=str(archive),
+                path_in_repo=ARCHIVE_NAME,
+                repo_id=repo_id,
+                repo_type="dataset",
+                token=HF_TOKEN,
+            )
+            size_kb = archive.stat().st_size // 1024
+            log.info("State synced to %s (%d KB)", repo_id, size_kb)
+    except Exception as exc:
+        log.warning("Sync failed: %s", exc)
+def loop():
+    """Run sync_once every SYNC_INTERVAL seconds."""
+    log.info("Starting periodic sync (interval: %ds)", SYNC_INTERVAL)
+    while True:
+        time.sleep(SYNC_INTERVAL)
+        try:
+            sync_once()
+        except Exception as exc:
+            log.warning("Periodic sync error: %s", exc)
+if __name__ == "__main__":
+    cmd = sys.argv[1] if len(sys.argv) > 1 else "help"
+    if cmd == "restore":
+        restore()
+    elif cmd == "sync-once":
+        sync_once()
+    elif cmd == "loop":
+        loop()
+    else:
+        print(__doc__)
+        sys.exit(1)

nginx.conf ADDED Viewed

	@@ -0,0 +1,140 @@

+events {
+    worker_connections 1024;
+}
+# Non-root nginx: all paths redirected to /tmp
+pid /tmp/nginx.pid;
+error_log /tmp/nginx-error.log warn;
+http {
+    # Non-root temp dirs
+    client_body_temp_path /tmp/nginx-tmp/client;
+    proxy_temp_path        /tmp/nginx-tmp/proxy;
+    fastcgi_temp_path      /tmp/nginx-tmp/fastcgi;
+    uwsgi_temp_path        /tmp/nginx-tmp/uwsgi;
+    scgi_temp_path         /tmp/nginx-tmp/scgi;
+    access_log /dev/stdout;
+    error_log  /dev/stderr warn;
+    sendfile       on;
+    tcp_nopush     on;
+    tcp_nodelay    on;
+    keepalive_timeout 65;
+    # ── DeerFlow on HF Spaces ─────────────────────────────────────
+    server {
+        listen 7860 default_server;
+        server_name _;
+        # Allow 100 MB uploads (thread file attachments)
+        client_max_body_size 100M;
+        # HF Spaces embeds the app in an iframe — must allow framing
+        add_header X-Frame-Options "ALLOWALL" always;
+        add_header Content-Security-Policy "frame-ancestors *" always;
+        # CORS: strip upstream headers to avoid duplicates, then re-add
+        proxy_hide_header Access-Control-Allow-Origin;
+        proxy_hide_header Access-Control-Allow-Methods;
+        proxy_hide_header Access-Control-Allow-Headers;
+        proxy_hide_header Access-Control-Allow-Credentials;
+        add_header Access-Control-Allow-Origin  "*" always;
+        add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, PATCH, OPTIONS" always;
+        add_header Access-Control-Allow-Headers "*" always;
+        # CORS preflight
+        if ($request_method = OPTIONS) {
+            return 204;
+        }
+        # ── LangGraph-compatible API (rewrites /api/langgraph/* → /api/*) ──
+        # The backend exposes /api/* natively; the /api/langgraph/ prefix is a
+        # public-facing alias used by the Next.js client and LangGraph SDK.
+        location /api/langgraph/ {
+            rewrite ^/api/langgraph/(.*) /api/$1 break;
+            proxy_pass         http://127.0.0.1:8001;
+            proxy_http_version 1.1;
+            proxy_set_header Host              $host;
+            proxy_set_header X-Real-IP         $remote_addr;
+            proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_set_header Connection        "";
+            # SSE / streaming (agent responses are streamed as server-sent events)
+            proxy_buffering         off;
+            proxy_cache             off;
+            proxy_set_header        X-Accel-Buffering no;
+            chunked_transfer_encoding on;
+            proxy_connect_timeout 600s;
+            proxy_send_timeout    600s;
+            proxy_read_timeout    600s;
+        }
+        # ── Health check ──────────────────────────────────────────
+        location = /health {
+            proxy_pass         http://127.0.0.1:8001/health;
+            proxy_http_version 1.1;
+            proxy_set_header Host $host;
+        }
+        # ── API docs (Swagger / ReDoc / OpenAPI) ──────────────────
+        location ~ ^/(docs|redoc|openapi\.json)$ {
+            proxy_pass         http://127.0.0.1:8001;
+            proxy_http_version 1.1;
+            proxy_set_header Host $host;
+        }
+        # ── Thread file uploads (large body, no buffering) ────────
+        location ~ ^/api/threads/[^/]+/uploads {
+            proxy_pass              http://127.0.0.1:8001;
+            proxy_http_version      1.1;
+            proxy_set_header Host   $host;
+            proxy_set_header X-Real-IP         $remote_addr;
+            proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_request_buffering off;
+            client_max_body_size    100M;
+        }
+        # ── All remaining /api/* routes → backend ─────────────────
+        location /api/ {
+            proxy_pass         http://127.0.0.1:8001;
+            proxy_http_version 1.1;
+            proxy_set_header Host              $host;
+            proxy_set_header X-Real-IP         $remote_addr;
+            proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_set_header Connection        "";
+            # SSE support for all streaming API routes
+            proxy_buffering         off;
+            proxy_cache             off;
+            proxy_set_header        X-Accel-Buffering no;
+            chunked_transfer_encoding on;
+            proxy_connect_timeout 600s;
+            proxy_send_timeout    600s;
+            proxy_read_timeout    600s;
+        }
+        # ── All other requests → Next.js frontend ─────────────────
+        location / {
+            proxy_pass         http://127.0.0.1:3000;
+            proxy_http_version 1.1;
+            proxy_set_header Host              $host;
+            proxy_set_header X-Real-IP         $remote_addr;
+            proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_set_header Upgrade           $http_upgrade;
+            proxy_set_header Connection        "upgrade";
+            proxy_cache_bypass                 $http_upgrade;
+            proxy_connect_timeout 600s;
+            proxy_send_timeout    600s;
+            proxy_read_timeout    600s;
+        }
+    }
+}

start.sh ADDED Viewed

	@@ -0,0 +1,417 @@

+#!/bin/bash
+set -euo pipefail
+umask 0077
+# ════════════════════════════════════════════════════════════════
+# HuggingDeer — DeerFlow on Hugging Face Spaces
+# ════════════════════════════════════════════════════════════════
+APP_DIR="/app"
+DATA_DIR="${DEER_FLOW_HOME:-/app/data}"
+CONFIG_PATH="${DEER_FLOW_CONFIG_PATH:-$DATA_DIR/config.yaml}"
+BACKEND_PORT="${BACKEND_PORT:-8001}"
+FRONTEND_PORT="${FRONTEND_PORT:-3000}"
+PUBLIC_PORT="${PORT:-7860}"
+SYNC_INTERVAL="${SYNC_INTERVAL:-600}"
+BACKEND_READY_TIMEOUT="${BACKEND_READY_TIMEOUT:-120}"
+FRONTEND_READY_TIMEOUT="${FRONTEND_READY_TIMEOUT:-120}"
+# Export shell vars so inline Python scripts can read them via os.environ
+export DATA_DIR CONFIG_PATH BACKUP_DATASET_NAME SYNC_INTERVAL
+export DEER_FLOW_HOME="$DATA_DIR"
+export DEER_FLOW_CONFIG_PATH="$CONFIG_PATH"
+export DEER_FLOW_SKILLS_PATH="/app/skills"
+echo ""
+echo "  ╔══════════════════════════════════════════╗"
+echo "  ║        🦌 HuggingDeer — DeerFlow         ║"
+echo "  ╚══════════════════════════════════════════╝"
+echo ""
+# ── Required env validation ───────────────────────────────────────
+ERRORS=""
+if [ -z "${LLM_MODEL:-}" ]; then
+  ERRORS="${ERRORS}  - LLM_MODEL is not set (e.g. openai/gpt-4o, anthropic/claude-sonnet-4-5)\n"
+fi
+if [ -z "${LLM_API_KEY:-}" ]; then
+  ERRORS="${ERRORS}  - LLM_API_KEY is not set\n"
+fi
+if [ -n "$ERRORS" ]; then
+  echo "Missing required secrets:"
+  printf "%b" "$ERRORS"
+  echo ""
+  echo "Add them in HF Spaces → Settings → Secrets"
+  exit 1
+fi
+# ── Setup runtime directories ─────────────────────────────────────
+mkdir -p \
+  "$DATA_DIR" \
+  "$DATA_DIR/threads" \
+  "$DATA_DIR/uploads" \
+  "$DATA_DIR/workspace" \
+  "$DATA_DIR/logs" \
+  /tmp/nginx-tmp/client \
+  /tmp/nginx-tmp/proxy \
+  /tmp/nginx-tmp/fastcgi \
+  /tmp/nginx-tmp/uwsgi \
+  /tmp/nginx-tmp/scgi
+# ── Provider → env var + langchain class mapping ──────────────────
+# Parse LLM_MODEL in format "provider/model-name" (e.g. "openai/gpt-4o")
+LLM_PROVIDER=$(echo "$LLM_MODEL" | cut -d'/' -f1)
+LLM_MODEL_NAME=$(echo "$LLM_MODEL" | cut -d'/' -f2-)
+# Resolve provider-specific settings
+LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+API_KEY_FIELD="api_key"
+MODEL_BASE_URL=""
+SUPPORTS_THINKING="false"
+case "$LLM_PROVIDER" in
+  anthropic)
+    export ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_anthropic:ChatAnthropic"
+    API_KEY_FIELD="api_key"
+    SUPPORTS_THINKING="true"
+    ;;
+  google|gemini)
+    export GEMINI_API_KEY="${GEMINI_API_KEY:-$LLM_API_KEY}"
+    export GOOGLE_API_KEY="${GOOGLE_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_google_genai:ChatGoogleGenerativeAI"
+    API_KEY_FIELD="gemini_api_key"
+    LLM_MODEL_NAME="${LLM_MODEL_NAME:-$LLM_PROVIDER}"
+    SUPPORTS_THINKING="true"
+    ;;
+  deepseek)
+    export DEEPSEEK_API_KEY="${DEEPSEEK_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="deerflow.models.patched_deepseek:PatchedChatDeepSeek"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://api.deepseek.com/v1"
+    SUPPORTS_THINKING="true"
+    ;;
+  openrouter)
+    export OPENROUTER_API_KEY="${OPENROUTER_API_KEY:-$LLM_API_KEY}"
+    export OPENAI_API_KEY="${OPENAI_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://openrouter.ai/api/v1"
+    # OpenRouter model names include provider prefix (e.g. anthropic/claude-3-5-sonnet)
+    LLM_MODEL_NAME="$LLM_MODEL"
+    ;;
+  qwen|dashscope|alibaba)
+    export DASHSCOPE_API_KEY="${DASHSCOPE_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
+    ;;
+  moonshot|kimi)
+    export MOONSHOT_API_KEY="${MOONSHOT_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://api.moonshot.cn/v1"
+    ;;
+  mistral)
+    export MISTRAL_API_KEY="${MISTRAL_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://api.mistral.ai/v1"
+    ;;
+  xai|grok)
+    export XAI_API_KEY="${XAI_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://api.x.ai/v1"
+    ;;
+  groq)
+    export GROQ_API_KEY="${GROQ_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    MODEL_BASE_URL="https://api.groq.com/openai/v1"
+    ;;
+  openai|*)
+    export OPENAI_API_KEY="${OPENAI_API_KEY:-$LLM_API_KEY}"
+    LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+    API_KEY_FIELD="api_key"
+    ;;
+esac
+# Custom OpenAI-compatible provider override
+if [ -n "${CUSTOM_BASE_URL:-}" ]; then
+  export OPENAI_API_KEY="${OPENAI_API_KEY:-$LLM_API_KEY}"
+  LANGCHAIN_CLASS="langchain_openai:ChatOpenAI"
+  API_KEY_FIELD="api_key"
+  MODEL_BASE_URL="$CUSTOM_BASE_URL"
+fi
+export LLM_PROVIDER LLM_MODEL_NAME LANGCHAIN_CLASS API_KEY_FIELD MODEL_BASE_URL SUPPORTS_THINKING
+export SERPER_API_KEY="${SERPER_API_KEY:-}"
+export TAVILY_API_KEY="${TAVILY_API_KEY:-}"
+export JINA_API_KEY="${JINA_API_KEY:-}"
+# ── Restore from HF Dataset (if configured) ───────────────────────
+if [ -n "${HF_TOKEN:-}" ]; then
+  echo "Restoring state from HF Dataset..."
+  python3 "$APP_DIR/deer-sync.py" restore || echo "Warning: restore failed, starting fresh."
+else
+  echo "HF_TOKEN not set — running without dataset persistence."
+fi
+# ── Generate config.yaml ──────────────────────────────────────────
+echo "Generating config.yaml..."
+python3 - <<'PYEOF'
+import os, yaml
+from pathlib import Path
+data_dir = Path(os.environ["DATA_DIR"])
+config_path = Path(os.environ["CONFIG_PATH"])
+# Load example config as base if no user config exists
+if not config_path.exists():
+    example = Path("/app/config.example.yaml")
+    if example.exists():
+        base = yaml.safe_load(example.read_text()) or {}
+    else:
+        base = {}
+else:
+    base = yaml.safe_load(config_path.read_text()) or {}
+model_name   = os.environ["LLM_MODEL_NAME"]
+lc_class     = os.environ["LANGCHAIN_CLASS"]
+api_key_field = os.environ["API_KEY_FIELD"]
+base_url      = os.environ.get("MODEL_BASE_URL", "")
+llm_api_key   = os.environ.get("LLM_API_KEY", "")
+thinking      = os.environ.get("SUPPORTS_THINKING", "false").lower() == "true"
+# Build model entry
+model_entry = {
+    "name": model_name,
+    "display_name": model_name,
+    "use": lc_class,
+    "model": model_name,
+    api_key_field: llm_api_key,
+    "request_timeout": 600.0,
+    "max_retries": 2,
+    "max_tokens": 8192,
+}
+if base_url:
+    model_entry["base_url"] = base_url
+if thinking:
+    model_entry["supports_thinking"] = True
+# Override models section with our single configured model
+base["models"] = [model_entry]
+# Sandbox: local (no Docker on HF Spaces)
+base.setdefault("sandbox", {})
+base["sandbox"]["use"] = "deerflow.sandbox.local:LocalSandboxProvider"
+base["sandbox"]["allow_host_bash"] = False
+# Search tools: prefer Serper > Tavily > DuckDuckGo (default)
+serper_key  = os.environ.get("SERPER_API_KEY", "")
+tavily_key  = os.environ.get("TAVILY_API_KEY", "")
+if serper_key:
+    web_search_tool = {
+        "name": "web_search", "group": "web",
+        "use": "deerflow.community.serper.tools:web_search_tool",
+        "max_results": 5, "api_key": serper_key,
+    }
+elif tavily_key:
+    web_search_tool = {
+        "name": "web_search", "group": "web",
+        "use": "deerflow.community.tavily.tools:web_search_tool",
+        "max_results": 5, "api_key": tavily_key,
+    }
+else:
+    web_search_tool = {
+        "name": "web_search", "group": "web",
+        "use": "deerflow.community.ddg_search.tools:web_search_tool",
+        "max_results": 5,
+    }
+# Preserve existing tool list, replacing web_search entry
+existing_tools = base.get("tools", [])
+other_tools = [t for t in existing_tools if t.get("name") != "web_search"]
+base["tools"] = [web_search_tool] + other_tools
+# Jina AI web_fetch (no key needed for basic usage)
+jina_key = os.environ.get("JINA_API_KEY", "")
+has_web_fetch = any(t.get("name") == "web_fetch" for t in base["tools"])
+if not has_web_fetch:
+    web_fetch_entry = {
+        "name": "web_fetch", "group": "web",
+        "use": "deerflow.community.jina_ai.tools:web_fetch_tool",
+        "timeout": 15,
+    }
+    if jina_key:
+        web_fetch_entry["api_key"] = jina_key
+    base["tools"].append(web_fetch_entry)
+# Persistence: SQLite in data dir
+base.setdefault("database", {})
+base["database"].setdefault("backend", "sqlite")
+# Database file lives in DATA_DIR (persisted via HF Dataset sync)
+db_path = str(data_dir / "deerflow.db")
+base["database"].setdefault("url", f"sqlite+aiosqlite:///{db_path}")
+# Skills path
+base.setdefault("skills", {})
+base["skills"]["path"] = "/app/skills"
+# CORS: allow HF Space URL + localhost
+space_host = os.environ.get("SPACE_HOST", "")
+cors_origins = ["http://localhost:3000", "http://localhost:7860"]
+if space_host:
+    cors_origins.append(f"https://{space_host}")
+# Set via env (picked up by gateway config loader)
+os.environ["CORS_ORIGINS"] = ",".join(cors_origins)
+config_path.parent.mkdir(parents=True, exist_ok=True)
+config_path.write_text(yaml.safe_dump(base, sort_keys=False, allow_unicode=True))
+config_path.chmod(0o600)
+print(f"Config written to {config_path}")
+PYEOF
+# ── CORS origins env for backend ─────────────────────────────────
+SPACE_HOST="${SPACE_HOST:-}"
+if [ -n "$SPACE_HOST" ]; then
+  export CORS_ORIGINS="${CORS_ORIGINS:-http://localhost:3000,http://localhost:7860,https://$SPACE_HOST}"
+else
+  export CORS_ORIGINS="${CORS_ORIGINS:-http://localhost:3000,http://localhost:7860}"
+fi
+# ── Startup summary ───────────────────────────────────────────────
+echo ""
+echo "Model     : $LLM_MODEL"
+echo "Provider  : $LLM_PROVIDER"
+echo "Data dir  : $DATA_DIR"
+if [ -n "${SERPER_API_KEY:-}" ]; then
+  echo "Search    : Serper (Google)"
+elif [ -n "${TAVILY_API_KEY:-}" ]; then
+  echo "Search    : Tavily"
+else
+  echo "Search    : DuckDuckGo (no API key)"
+fi
+if [ -n "${HF_TOKEN:-}" ]; then
+  echo "Backup    : ${BACKUP_DATASET_NAME:-huggingdeer-backup} (every ${SYNC_INTERVAL}s)"
+else
+  echo "Backup    : disabled"
+fi
+if [ -n "$SPACE_HOST" ]; then
+  echo "URL       : https://$SPACE_HOST"
+fi
+echo ""
+# ── Graceful shutdown ─────────────────────────────────────────────
+graceful_shutdown() {
+  echo "Shutting down HuggingDeer..."
+  if [ -n "${HF_TOKEN:-}" ]; then
+    echo "Saving state to HF Dataset..."
+    python3 "$APP_DIR/deer-sync.py" sync-once || echo "Warning: shutdown sync failed."
+  fi
+  # Stop nginx daemon (nginx -s quit = graceful drain)
+  nginx -s quit 2>/dev/null || true
+  # Stop background shell jobs (backend, frontend, sync loop)
+  kill $(jobs -p) 2>/dev/null || true
+  sleep 2
+  exit 0
+}
+trap graceful_shutdown SIGTERM SIGINT
+# ── Start nginx ───────────────────────────────────────────────────
+echo "Starting nginx on port $PUBLIC_PORT..."
+# Validate config first
+nginx -t 2>/dev/null && nginx || {
+  echo "nginx config error:"
+  nginx -t
+  exit 1
+}
+# ── Start backend (uvicorn) ───────────────────────────────────────
+echo "Starting DeerFlow backend on port $BACKEND_PORT..."
+(
+  cd "$APP_DIR/backend" && \
+  PYTHONPATH=. \
+  uv run --no-sync \
+    uvicorn app.gateway.app:app \
+      --host 127.0.0.1 \
+      --port "$BACKEND_PORT" \
+      --workers 2 \
+    2>&1 | tee -a "$DATA_DIR/logs/backend.log"
+) &
+BACKEND_PID=$!
+# Wait for backend to be ready
+echo "Waiting for backend..."
+ready=false
+for ((i=0; i<BACKEND_READY_TIMEOUT; i++)); do
+  if (echo > "/dev/tcp/127.0.0.1/$BACKEND_PORT") 2>/dev/null; then
+    ready=true
+    break
+  fi
+  if ! kill -0 "$BACKEND_PID" 2>/dev/null; then
+    echo "Backend process died. Last 30 log lines:"
+    echo "────────────────────────────────────────"
+    tail -30 "$DATA_DIR/logs/backend.log" || true
+    exit 1
+  fi
+  sleep 1
+done
+if [ "$ready" != "true" ]; then
+  echo "Backend failed to start within ${BACKEND_READY_TIMEOUT}s. Last 30 log lines:"
+  tail -30 "$DATA_DIR/logs/backend.log" || true
+  exit 1
+fi
+echo "Backend ready."
+# ── Start frontend (Next.js) ──────────────────────────────────────
+echo "Starting Next.js frontend on port $FRONTEND_PORT..."
+(
+  cd "$APP_DIR/frontend" && \
+  DEER_FLOW_INTERNAL_GATEWAY_BASE_URL="http://127.0.0.1:$BACKEND_PORT" \
+  PORT="$FRONTEND_PORT" \
+  node node_modules/.bin/next start -p "$FRONTEND_PORT" \
+    2>&1 | tee -a "$DATA_DIR/logs/frontend.log"
+) &
+FRONTEND_PID=$!
+# Wait for frontend
+echo "Waiting for frontend..."
+ready=false
+for ((i=0; i<FRONTEND_READY_TIMEOUT; i++)); do
+  if (echo > "/dev/tcp/127.0.0.1/$FRONTEND_PORT") 2>/dev/null; then
+    ready=true
+    break
+  fi
+  if ! kill -0 "$FRONTEND_PID" 2>/dev/null; then
+    echo "Frontend process died. Last 30 log lines:"
+    echo "────────────────────────────────────────"
+    tail -30 "$DATA_DIR/logs/frontend.log" || true
+    exit 1
+  fi
+  sleep 1
+done
+if [ "$ready" != "true" ]; then
+  echo "Frontend failed to start within ${FRONTEND_READY_TIMEOUT}s. Last 30 log lines:"
+  tail -30 "$DATA_DIR/logs/frontend.log" || true
+  exit 1
+fi
+echo "Frontend ready."
+echo ""
+echo "HuggingDeer is up ✓  →  http://localhost:$PUBLIC_PORT"
+echo ""
+# ── Periodic HF Dataset sync ──────────────────────────────────────
+if [ -n "${HF_TOKEN:-}" ]; then
+  (
+    while true; do
+      sleep "$SYNC_INTERVAL"
+      python3 "$APP_DIR/deer-sync.py" sync-once 2>/dev/null || true
+    done
+  ) &
+fi
+# ── Wait for backend (primary process) ───────────────────────────
+wait "$BACKEND_PID"