oss-vs-frontier-assistant / docs /DEPLOY_GUIDE.md
KevinMerchant13's picture
polish: add MIT license
3683c14 verified
# Deploy Guide
End-to-end deployment of this project to a Hugging Face Space. Covers both the
**free CPU** path (used for this deploy) and the **ZeroGPU** path (for when a
HF PRO subscription is available).
## Prerequisites
- A Hugging Face account with an access token (User Settings β†’ Access Tokens; needs **write** scope).
- The token in your local `.env` as `HF_TOKEN=hf_…`.
- The Python deps installed (`uv sync` in the project root).
## One-shot deploy via `huggingface_hub` Python API
Everything below β€” create the Space, set secrets, upload files β€” can be done in one Python script. This is what was actually run to produce the live demo.
```python
from huggingface_hub import HfApi
from src.config import settings
REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)
# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
repo_id=REPO_ID,
repo_type="space",
space_sdk="gradio",
space_hardware="cpu-basic", # or "zero-a10g" if you have HF PRO
exist_ok=True,
private=False,
)
# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
"ANTHROPIC_API_KEY": settings.anthropic_api_key,
"HF_TOKEN": settings.hf_token,
"TAVILY_API_KEY": settings.tavily_api_key,
"LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
"LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
"LANGFUSE_HOST": settings.langfuse_host,
}.items():
if v:
api.add_space_secret(repo_id=REPO_ID, key=k, value=v)
# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
repo_id=REPO_ID,
repo_type="space",
folder_path=".",
commit_message="deploy",
ignore_patterns=[
".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
"data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
"*.pyc", ".gitignore", "uv.lock",
],
)
```
Why this approach over the web UI:
- **No browser steps.** Reproducible from any machine with the token.
- **Secrets travel safely.** They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
- **Re-runnable.** `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial.
## What HF Spaces reads
| File | Role on Spaces |
|--------------|--------------------------------------------------------------------------------|
| `README.md` | The YAML frontmatter at the top configures the Space (sdk, hardware, etc.). |
| `requirements.txt` | Installed at build time. **Must be kept in sync with `pyproject.toml`.** |
| `app.py` | Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`. |
The YAML frontmatter currently used:
```yaml
---
title: OSS vs Frontier Assistant
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---
```
## Switching to ZeroGPU later
1. Subscribe to HF PRO ($9/mo).
2. In the Space's **Settings β†’ Hardware**, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`).
3. Update the YAML in `README.md` to `hardware: zero-a10g`.
4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time β€” Qwen latency drops from ~30-60s to ~3-8s per reply.
## Re-deploy after code changes
```bash
# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
repo_id="<your-username>/oss-vs-frontier-assistant",
repo_type="space",
folder_path=".",
commit_message="update",
ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
"data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
"*.pyc", ".gitignore", "uv.lock"],
)
PY
```
HF triggers a new build automatically when files change.
## Troubleshooting
| Symptom on Spaces | Likely cause / fix |
|---------------------------------------------|-------------------------------------------------------------------------------|
| Build fails on `torch`/`transformers` install | Mismatch between `requirements.txt` pin and HF base image β€” check `python_version`. |
| `ANTHROPIC_API_KEY is not set` at runtime | Secret not added in Space settings, or empty. Re-run the secrets loop above. |
| 403 on `create_repo` mentioning ZeroGPU | ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead. |
| Qwen replies very slowly (30-60s) | Expected on `cpu-basic`. Switch to ZeroGPU per the section above. |
| Tracing missing from Langfuse | Network timeout on the Space β†’ traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. |