# Deploy Guide End-to-end deployment of this project to a Hugging Face Space. Covers both the **free CPU** path (used for this deploy) and the **ZeroGPU** path (for when a HF PRO subscription is available). ## Prerequisites - A Hugging Face account with an access token (User Settings → Access Tokens; needs **write** scope). - The token in your local `.env` as `HF_TOKEN=hf_…`. - The Python deps installed (`uv sync` in the project root). ## One-shot deploy via `huggingface_hub` Python API Everything below — create the Space, set secrets, upload files — can be done in one Python script. This is what was actually run to produce the live demo. ```python from huggingface_hub import HfApi from src.config import settings REPO_ID = "/oss-vs-frontier-assistant" api = HfApi(token=settings.hf_token) # 1) Create the Space (idempotent; safe to re-run). api.create_repo( repo_id=REPO_ID, repo_type="space", space_sdk="gradio", space_hardware="cpu-basic", # or "zero-a10g" if you have HF PRO exist_ok=True, private=False, ) # 2) Set every required secret. These show up as env vars in the Space runtime. for k, v in { "ANTHROPIC_API_KEY": settings.anthropic_api_key, "HF_TOKEN": settings.hf_token, "TAVILY_API_KEY": settings.tavily_api_key, "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key, "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key, "LANGFUSE_HOST": settings.langfuse_host, }.items(): if v: api.add_space_secret(repo_id=REPO_ID, key=k, value=v) # 3) Upload the project, excluding local-only / sensitive files. api.upload_folder( repo_id=REPO_ID, repo_type="space", folder_path=".", commit_message="deploy", ignore_patterns=[ ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**", "data/**", "results/**", "__pycache__/**", "**/__pycache__/**", "*.pyc", ".gitignore", "uv.lock", ], ) ``` Why this approach over the web UI: - **No browser steps.** Reproducible from any machine with the token. - **Secrets travel safely.** They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config. - **Re-runnable.** `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial. ## What HF Spaces reads | File | Role on Spaces | |--------------|--------------------------------------------------------------------------------| | `README.md` | The YAML frontmatter at the top configures the Space (sdk, hardware, etc.). | | `requirements.txt` | Installed at build time. **Must be kept in sync with `pyproject.toml`.** | | `app.py` | Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`. | The YAML frontmatter currently used: ```yaml --- title: OSS vs Frontier Assistant emoji: 🤖 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.14.0 python_version: "3.11" app_file: app.py hardware: cpu-basic pinned: false --- ``` ## Switching to ZeroGPU later 1. Subscribe to HF PRO ($9/mo). 2. In the Space's **Settings → Hardware**, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`). 3. Update the YAML in `README.md` to `hardware: zero-a10g`. 4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time — Qwen latency drops from ~30-60s to ~3-8s per reply. ## Re-deploy after code changes ```bash # Bump requirements.txt if pyproject.toml changed, then: uv run python - <<'PY' from huggingface_hub import HfApi from src.config import settings HfApi(token=settings.hf_token).upload_folder( repo_id="/oss-vs-frontier-assistant", repo_type="space", folder_path=".", commit_message="update", ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**", "data/**", "results/**", "__pycache__/**", "**/__pycache__/**", "*.pyc", ".gitignore", "uv.lock"], ) PY ``` HF triggers a new build automatically when files change. ## Troubleshooting | Symptom on Spaces | Likely cause / fix | |---------------------------------------------|-------------------------------------------------------------------------------| | Build fails on `torch`/`transformers` install | Mismatch between `requirements.txt` pin and HF base image — check `python_version`. | | `ANTHROPIC_API_KEY is not set` at runtime | Secret not added in Space settings, or empty. Re-run the secrets loop above. | | 403 on `create_repo` mentioning ZeroGPU | ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead. | | Qwen replies very slowly (30-60s) | Expected on `cpu-basic`. Switch to ZeroGPU per the section above. | | Tracing missing from Langfuse | Network timeout on the Space → traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. |