Spaces:

KevinMerchant13
/

oss-vs-frontier-assistant

Running

App Files Files Community

oss-vs-frontier-assistant / docs /DEPLOY_GUIDE.md

KevinMerchant13

polish: add MIT license

3683c14 verified 3 days ago

preview code

raw

history blame contribute delete

5.11 kB

Deploy Guide

End-to-end deployment of this project to a Hugging Face Space. Covers both the free CPU path (used for this deploy) and the ZeroGPU path (for when a HF PRO subscription is available).

Prerequisites

A Hugging Face account with an access token (User Settings → Access Tokens; needs write scope).
The token in your local .env as HF_TOKEN=hf_….
The Python deps installed (uv sync in the project root).

One-shot deploy via `huggingface_hub` Python API

Everything below — create the Space, set secrets, upload files — can be done in one Python script. This is what was actually run to produce the live demo.

from huggingface_hub import HfApi
from src.config import settings

REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)

# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
    repo_id=REPO_ID,
    repo_type="space",
    space_sdk="gradio",
    space_hardware="cpu-basic",     # or "zero-a10g" if you have HF PRO
    exist_ok=True,
    private=False,
)

# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
    "ANTHROPIC_API_KEY":   settings.anthropic_api_key,
    "HF_TOKEN":            settings.hf_token,
    "TAVILY_API_KEY":      settings.tavily_api_key,
    "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
    "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
    "LANGFUSE_HOST":       settings.langfuse_host,
}.items():
    if v:
        api.add_space_secret(repo_id=REPO_ID, key=k, value=v)

# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
    repo_id=REPO_ID,
    repo_type="space",
    folder_path=".",
    commit_message="deploy",
    ignore_patterns=[
        ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
        "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
        "*.pyc", ".gitignore", "uv.lock",
    ],
)

Why this approach over the web UI:

No browser steps. Reproducible from any machine with the token.
Secrets travel safely. They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
Re-runnable. exist_ok=True + upload_folder overwrite makes re-deploys trivial.

What HF Spaces reads

File	Role on Spaces
`README.md`	The YAML frontmatter at the top configures the Space (sdk, hardware, etc.).
`requirements.txt`	Installed at build time. Must be kept in sync with `pyproject.toml`.
`app.py`	Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`.

The YAML frontmatter currently used:

---
title: OSS vs Frontier Assistant
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---

Switching to ZeroGPU later

Subscribe to HF PRO ($9/mo).
In the Space's Settings → Hardware, switch to Nvidia A10G - Zero (or rerun the deploy script with space_hardware="zero-a10g").
Update the YAML in README.md to hardware: zero-a10g.
Re-upload: the @spaces.GPU(duration=120) decorator already on QwenChatModel._generate will start allocating real GPU time — Qwen latency drops from ~30-60s to ~3-8s per reply.

Re-deploy after code changes

# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
    repo_id="<your-username>/oss-vs-frontier-assistant",
    repo_type="space",
    folder_path=".",
    commit_message="update",
    ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
                     "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
                     "*.pyc", ".gitignore", "uv.lock"],
)
PY

HF triggers a new build automatically when files change.

Troubleshooting

Symptom on Spaces	Likely cause / fix
Build fails on `torch`/`transformers` install	Mismatch between `requirements.txt` pin and HF base image — check `python_version`.
`ANTHROPIC_API_KEY is not set` at runtime	Secret not added in Space settings, or empty. Re-run the secrets loop above.
403 on `create_repo` mentioning ZeroGPU	ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead.
Qwen replies very slowly (30-60s)	Expected on `cpu-basic`. Switch to ZeroGPU per the section above.
Tracing missing from Langfuse	Network timeout on the Space → traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`.