Spaces:

KevinMerchant13
/

oss-vs-frontier-assistant

Running

App Files Files Community

oss-vs-frontier-assistant / docs /DEPLOY_GUIDE.md

KevinMerchant13

polish: add MIT license

3683c14 verified 3 days ago

preview code

raw

history blame contribute delete

5.11 kB

	# Deploy Guide

	End-to-end deployment of this project to a Hugging Face Space. Covers both the
	free CPU path (used for this deploy) and the ZeroGPU path (for when a
	HF PRO subscription is available).

	## Prerequisites

	- A Hugging Face account with an access token (User Settings → Access Tokens; needs write scope).
	- The token in your local `.env` as `HF_TOKEN=hf_…`.
	- The Python deps installed (`uv sync` in the project root).

	## One-shot deploy via `huggingface_hub` Python API

	Everything below — create the Space, set secrets, upload files — can be done in one Python script. This is what was actually run to produce the live demo.

	```python
	from huggingface_hub import HfApi
	from src.config import settings

	REPO_ID = "<your-username>/oss-vs-frontier-assistant"
	api = HfApi(token=settings.hf_token)

	# 1) Create the Space (idempotent; safe to re-run).
	api.create_repo(
	repo_id=REPO_ID,
	repo_type="space",
	space_sdk="gradio",
	space_hardware="cpu-basic", # or "zero-a10g" if you have HF PRO
	exist_ok=True,
	private=False,
	)

	# 2) Set every required secret. These show up as env vars in the Space runtime.
	for k, v in {
	"ANTHROPIC_API_KEY": settings.anthropic_api_key,
	"HF_TOKEN": settings.hf_token,
	"TAVILY_API_KEY": settings.tavily_api_key,
	"LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
	"LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
	"LANGFUSE_HOST": settings.langfuse_host,
	}.items():
	if v:
	api.add_space_secret(repo_id=REPO_ID, key=k, value=v)

	# 3) Upload the project, excluding local-only / sensitive files.
	api.upload_folder(
	repo_id=REPO_ID,
	repo_type="space",
	folder_path=".",
	commit_message="deploy",
	ignore_patterns=[
	".env", ".git/", ".venv/", ".pytest_cache/", ".claude/",
	"data/", "results/", "__pycache__/", "/__pycache__/**",
	"*.pyc", ".gitignore", "uv.lock",
	],
	)
	```

	Why this approach over the web UI:

	- No browser steps. Reproducible from any machine with the token.
	- Secrets travel safely. They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
	- Re-runnable. `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial.

	## What HF Spaces reads

	\| File \| Role on Spaces \|
	\|--------------\|--------------------------------------------------------------------------------\|
	\| `README.md` \| The YAML frontmatter at the top configures the Space (sdk, hardware, etc.). \|
	\| `requirements.txt` \| Installed at build time. Must be kept in sync with `pyproject.toml`. \|
	\| `app.py` \| Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`. \|

	The YAML frontmatter currently used:

	```yaml
	---
	title: OSS vs Frontier Assistant
	emoji: 🤖
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 6.14.0
	python_version: "3.11"
	app_file: app.py
	hardware: cpu-basic
	pinned: false
	---
	```

	## Switching to ZeroGPU later

	1. Subscribe to HF PRO ($9/mo).
	2. In the Space's Settings → Hardware, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`).
	3. Update the YAML in `README.md` to `hardware: zero-a10g`.
	4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time — Qwen latency drops from ~30-60s to ~3-8s per reply.

	## Re-deploy after code changes

	```bash
	# Bump requirements.txt if pyproject.toml changed, then:
	uv run python - <<'PY'
	from huggingface_hub import HfApi
	from src.config import settings
	HfApi(token=settings.hf_token).upload_folder(
	repo_id="<your-username>/oss-vs-frontier-assistant",
	repo_type="space",
	folder_path=".",
	commit_message="update",
	ignore_patterns=[".env", ".git/", ".venv/", ".pytest_cache/", ".claude/",
	"data/", "results/", "__pycache__/", "/__pycache__/**",
	"*.pyc", ".gitignore", "uv.lock"],
	)
	PY
	```

	HF triggers a new build automatically when files change.

	## Troubleshooting

	\| Symptom on Spaces \| Likely cause / fix \|
	\|---------------------------------------------\|-------------------------------------------------------------------------------\|
	\| Build fails on `torch`/`transformers` install \| Mismatch between `requirements.txt` pin and HF base image — check `python_version`. \|
	\| `ANTHROPIC_API_KEY is not set` at runtime \| Secret not added in Space settings, or empty. Re-run the secrets loop above. \|
	\| 403 on `create_repo` mentioning ZeroGPU \| ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead. \|
	\| Qwen replies very slowly (30-60s) \| Expected on `cpu-basic`. Switch to ZeroGPU per the section above. \|
	\| Tracing missing from Langfuse \| Network timeout on the Space → traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. \|