oss-vs-frontier-assistant / docs /DEPLOY_GUIDE.md
KevinMerchant13's picture
polish: add MIT license
3683c14 verified

Deploy Guide

End-to-end deployment of this project to a Hugging Face Space. Covers both the free CPU path (used for this deploy) and the ZeroGPU path (for when a HF PRO subscription is available).

Prerequisites

  • A Hugging Face account with an access token (User Settings β†’ Access Tokens; needs write scope).
  • The token in your local .env as HF_TOKEN=hf_….
  • The Python deps installed (uv sync in the project root).

One-shot deploy via huggingface_hub Python API

Everything below β€” create the Space, set secrets, upload files β€” can be done in one Python script. This is what was actually run to produce the live demo.

from huggingface_hub import HfApi
from src.config import settings

REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)

# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
    repo_id=REPO_ID,
    repo_type="space",
    space_sdk="gradio",
    space_hardware="cpu-basic",     # or "zero-a10g" if you have HF PRO
    exist_ok=True,
    private=False,
)

# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
    "ANTHROPIC_API_KEY":   settings.anthropic_api_key,
    "HF_TOKEN":            settings.hf_token,
    "TAVILY_API_KEY":      settings.tavily_api_key,
    "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
    "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
    "LANGFUSE_HOST":       settings.langfuse_host,
}.items():
    if v:
        api.add_space_secret(repo_id=REPO_ID, key=k, value=v)

# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
    repo_id=REPO_ID,
    repo_type="space",
    folder_path=".",
    commit_message="deploy",
    ignore_patterns=[
        ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
        "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
        "*.pyc", ".gitignore", "uv.lock",
    ],
)

Why this approach over the web UI:

  • No browser steps. Reproducible from any machine with the token.
  • Secrets travel safely. They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
  • Re-runnable. exist_ok=True + upload_folder overwrite makes re-deploys trivial.

What HF Spaces reads

File Role on Spaces
README.md The YAML frontmatter at the top configures the Space (sdk, hardware, etc.).
requirements.txt Installed at build time. Must be kept in sync with pyproject.toml.
app.py Entry point (app_file: app.py in the YAML); HF imports it and finds demo.

The YAML frontmatter currently used:

---
title: OSS vs Frontier Assistant
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---

Switching to ZeroGPU later

  1. Subscribe to HF PRO ($9/mo).
  2. In the Space's Settings β†’ Hardware, switch to Nvidia A10G - Zero (or rerun the deploy script with space_hardware="zero-a10g").
  3. Update the YAML in README.md to hardware: zero-a10g.
  4. Re-upload: the @spaces.GPU(duration=120) decorator already on QwenChatModel._generate will start allocating real GPU time β€” Qwen latency drops from ~30-60s to ~3-8s per reply.

Re-deploy after code changes

# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
    repo_id="<your-username>/oss-vs-frontier-assistant",
    repo_type="space",
    folder_path=".",
    commit_message="update",
    ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
                     "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
                     "*.pyc", ".gitignore", "uv.lock"],
)
PY

HF triggers a new build automatically when files change.

Troubleshooting

Symptom on Spaces Likely cause / fix
Build fails on torch/transformers install Mismatch between requirements.txt pin and HF base image β€” check python_version.
ANTHROPIC_API_KEY is not set at runtime Secret not added in Space settings, or empty. Re-run the secrets loop above.
403 on create_repo mentioning ZeroGPU ZeroGPU is gated behind HF PRO; use space_hardware="cpu-basic" instead.
Qwen replies very slowly (30-60s) Expected on cpu-basic. Switch to ZeroGPU per the section above.
Tracing missing from Langfuse Network timeout on the Space β†’ traces. Non-fatal; bump LANGFUSE_TIMEOUT=30.