| # Deploy Guide |
|
|
| End-to-end deployment of this project to a Hugging Face Space. Covers both the |
| **free CPU** path (used for this deploy) and the **ZeroGPU** path (for when a |
| HF PRO subscription is available). |
|
|
| ## Prerequisites |
|
|
| - A Hugging Face account with an access token (User Settings β Access Tokens; needs **write** scope). |
| - The token in your local `.env` as `HF_TOKEN=hf_β¦`. |
| - The Python deps installed (`uv sync` in the project root). |
|
|
| ## One-shot deploy via `huggingface_hub` Python API |
| |
| Everything below β create the Space, set secrets, upload files β can be done in one Python script. This is what was actually run to produce the live demo. |
| |
| ```python |
| from huggingface_hub import HfApi |
| from src.config import settings |
|
|
| REPO_ID = "<your-username>/oss-vs-frontier-assistant" |
| api = HfApi(token=settings.hf_token) |
|
|
| # 1) Create the Space (idempotent; safe to re-run). |
| api.create_repo( |
| repo_id=REPO_ID, |
| repo_type="space", |
| space_sdk="gradio", |
| space_hardware="cpu-basic", # or "zero-a10g" if you have HF PRO |
| exist_ok=True, |
| private=False, |
| ) |
| |
| # 2) Set every required secret. These show up as env vars in the Space runtime. |
| for k, v in { |
| "ANTHROPIC_API_KEY": settings.anthropic_api_key, |
| "HF_TOKEN": settings.hf_token, |
| "TAVILY_API_KEY": settings.tavily_api_key, |
| "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key, |
| "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key, |
| "LANGFUSE_HOST": settings.langfuse_host, |
| }.items(): |
| if v: |
| api.add_space_secret(repo_id=REPO_ID, key=k, value=v) |
| |
| # 3) Upload the project, excluding local-only / sensitive files. |
| api.upload_folder( |
| repo_id=REPO_ID, |
| repo_type="space", |
| folder_path=".", |
| commit_message="deploy", |
| ignore_patterns=[ |
| ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**", |
| "data/**", "results/**", "__pycache__/**", "**/__pycache__/**", |
| "*.pyc", ".gitignore", "uv.lock", |
| ], |
| ) |
| ``` |
| |
| Why this approach over the web UI: |
|
|
| - **No browser steps.** Reproducible from any machine with the token. |
| - **Secrets travel safely.** They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config. |
| - **Re-runnable.** `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial. |
|
|
| ## What HF Spaces reads |
|
|
| | File | Role on Spaces | |
| |--------------|--------------------------------------------------------------------------------| |
| | `README.md` | The YAML frontmatter at the top configures the Space (sdk, hardware, etc.). | |
| | `requirements.txt` | Installed at build time. **Must be kept in sync with `pyproject.toml`.** | |
| | `app.py` | Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`. | |
|
|
| The YAML frontmatter currently used: |
|
|
| ```yaml |
| --- |
| title: OSS vs Frontier Assistant |
| emoji: π€ |
| colorFrom: indigo |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 6.14.0 |
| python_version: "3.11" |
| app_file: app.py |
| hardware: cpu-basic |
| pinned: false |
| --- |
| ``` |
| |
| ## Switching to ZeroGPU later |
| |
| 1. Subscribe to HF PRO ($9/mo). |
| 2. In the Space's **Settings β Hardware**, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`). |
| 3. Update the YAML in `README.md` to `hardware: zero-a10g`. |
| 4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time β Qwen latency drops from ~30-60s to ~3-8s per reply. |
|
|
| ## Re-deploy after code changes |
|
|
| ```bash |
| # Bump requirements.txt if pyproject.toml changed, then: |
| uv run python - <<'PY' |
| from huggingface_hub import HfApi |
| from src.config import settings |
| HfApi(token=settings.hf_token).upload_folder( |
| repo_id="<your-username>/oss-vs-frontier-assistant", |
| repo_type="space", |
| folder_path=".", |
| commit_message="update", |
| ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**", |
| "data/**", "results/**", "__pycache__/**", "**/__pycache__/**", |
| "*.pyc", ".gitignore", "uv.lock"], |
| ) |
| PY |
| ``` |
|
|
| HF triggers a new build automatically when files change. |
|
|
| ## Troubleshooting |
|
|
| | Symptom on Spaces | Likely cause / fix | |
| |---------------------------------------------|-------------------------------------------------------------------------------| |
| | Build fails on `torch`/`transformers` install | Mismatch between `requirements.txt` pin and HF base image β check `python_version`. | |
| | `ANTHROPIC_API_KEY is not set` at runtime | Secret not added in Space settings, or empty. Re-run the secrets loop above. | |
| | 403 on `create_repo` mentioning ZeroGPU | ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead. | |
| | Qwen replies very slowly (30-60s) | Expected on `cpu-basic`. Switch to ZeroGPU per the section above. | |
| | Tracing missing from Langfuse | Network timeout on the Space β traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. | |
|
|