Deploy Guide
End-to-end deployment of this project to a Hugging Face Space. Covers both the free CPU path (used for this deploy) and the ZeroGPU path (for when a HF PRO subscription is available).
Prerequisites
- A Hugging Face account with an access token (User Settings β Access Tokens; needs write scope).
- The token in your local
.envasHF_TOKEN=hf_β¦. - The Python deps installed (
uv syncin the project root).
One-shot deploy via huggingface_hub Python API
Everything below β create the Space, set secrets, upload files β can be done in one Python script. This is what was actually run to produce the live demo.
from huggingface_hub import HfApi
from src.config import settings
REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)
# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
repo_id=REPO_ID,
repo_type="space",
space_sdk="gradio",
space_hardware="cpu-basic", # or "zero-a10g" if you have HF PRO
exist_ok=True,
private=False,
)
# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
"ANTHROPIC_API_KEY": settings.anthropic_api_key,
"HF_TOKEN": settings.hf_token,
"TAVILY_API_KEY": settings.tavily_api_key,
"LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
"LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
"LANGFUSE_HOST": settings.langfuse_host,
}.items():
if v:
api.add_space_secret(repo_id=REPO_ID, key=k, value=v)
# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
repo_id=REPO_ID,
repo_type="space",
folder_path=".",
commit_message="deploy",
ignore_patterns=[
".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
"data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
"*.pyc", ".gitignore", "uv.lock",
],
)
Why this approach over the web UI:
- No browser steps. Reproducible from any machine with the token.
- Secrets travel safely. They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
- Re-runnable.
exist_ok=True+upload_folderoverwrite makes re-deploys trivial.
What HF Spaces reads
| File | Role on Spaces |
|---|---|
README.md |
The YAML frontmatter at the top configures the Space (sdk, hardware, etc.). |
requirements.txt |
Installed at build time. Must be kept in sync with pyproject.toml. |
app.py |
Entry point (app_file: app.py in the YAML); HF imports it and finds demo. |
The YAML frontmatter currently used:
---
title: OSS vs Frontier Assistant
emoji: π€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---
Switching to ZeroGPU later
- Subscribe to HF PRO ($9/mo).
- In the Space's Settings β Hardware, switch to
Nvidia A10G - Zero(or rerun the deploy script withspace_hardware="zero-a10g"). - Update the YAML in
README.mdtohardware: zero-a10g. - Re-upload: the
@spaces.GPU(duration=120)decorator already onQwenChatModel._generatewill start allocating real GPU time β Qwen latency drops from ~30-60s to ~3-8s per reply.
Re-deploy after code changes
# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
repo_id="<your-username>/oss-vs-frontier-assistant",
repo_type="space",
folder_path=".",
commit_message="update",
ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
"data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
"*.pyc", ".gitignore", "uv.lock"],
)
PY
HF triggers a new build automatically when files change.
Troubleshooting
| Symptom on Spaces | Likely cause / fix |
|---|---|
Build fails on torch/transformers install |
Mismatch between requirements.txt pin and HF base image β check python_version. |
ANTHROPIC_API_KEY is not set at runtime |
Secret not added in Space settings, or empty. Re-run the secrets loop above. |
403 on create_repo mentioning ZeroGPU |
ZeroGPU is gated behind HF PRO; use space_hardware="cpu-basic" instead. |
| Qwen replies very slowly (30-60s) | Expected on cpu-basic. Switch to ZeroGPU per the section above. |
| Tracing missing from Langfuse | Network timeout on the Space β traces. Non-fatal; bump LANGFUSE_TIMEOUT=30. |