Spaces:

KevinMerchant13
/

oss-vs-frontier-assistant

Running

File size: 5,106 Bytes

# Deploy Guide

End-to-end deployment of this project to a Hugging Face Space. Covers both the
**free CPU** path (used for this deploy) and the **ZeroGPU** path (for when a
HF PRO subscription is available).

## Prerequisites

- A Hugging Face account with an access token (User Settings → Access Tokens; needs **write** scope).
- The token in your local `.env` as `HF_TOKEN=hf_…`.
- The Python deps installed (`uv sync` in the project root).

## One-shot deploy via `huggingface_hub` Python API

Everything below — create the Space, set secrets, upload files — can be done in one Python script. This is what was actually run to produce the live demo.

```python
from huggingface_hub import HfApi
from src.config import settings

REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)

# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
    repo_id=REPO_ID,
    repo_type="space",
    space_sdk="gradio",
    space_hardware="cpu-basic",     # or "zero-a10g" if you have HF PRO
    exist_ok=True,
    private=False,
)

# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
    "ANTHROPIC_API_KEY":   settings.anthropic_api_key,
    "HF_TOKEN":            settings.hf_token,
    "TAVILY_API_KEY":      settings.tavily_api_key,
    "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
    "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
    "LANGFUSE_HOST":       settings.langfuse_host,
}.items():
    if v:
        api.add_space_secret(repo_id=REPO_ID, key=k, value=v)

# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
    repo_id=REPO_ID,
    repo_type="space",
    folder_path=".",
    commit_message="deploy",
    ignore_patterns=[
        ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
        "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
        "*.pyc", ".gitignore", "uv.lock",
    ],
)
```

Why this approach over the web UI:

- **No browser steps.** Reproducible from any machine with the token.
- **Secrets travel safely.** They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
- **Re-runnable.** `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial.

## What HF Spaces reads

| File         | Role on Spaces                                                                 |
|--------------|--------------------------------------------------------------------------------|
| `README.md`  | The YAML frontmatter at the top configures the Space (sdk, hardware, etc.).    |
| `requirements.txt` | Installed at build time. **Must be kept in sync with `pyproject.toml`.**  |
| `app.py`     | Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`.  |

The YAML frontmatter currently used:

```yaml
---
title: OSS vs Frontier Assistant
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---
```

## Switching to ZeroGPU later

1. Subscribe to HF PRO ($9/mo).
2. In the Space's **Settings → Hardware**, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`).
3. Update the YAML in `README.md` to `hardware: zero-a10g`.
4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time — Qwen latency drops from ~30-60s to ~3-8s per reply.

## Re-deploy after code changes

```bash
# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
    repo_id="<your-username>/oss-vs-frontier-assistant",
    repo_type="space",
    folder_path=".",
    commit_message="update",
    ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
                     "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
                     "*.pyc", ".gitignore", "uv.lock"],
)
PY
```

HF triggers a new build automatically when files change.

## Troubleshooting

| Symptom on Spaces                           | Likely cause / fix                                                            |
|---------------------------------------------|-------------------------------------------------------------------------------|
| Build fails on `torch`/`transformers` install | Mismatch between `requirements.txt` pin and HF base image — check `python_version`. |
| `ANTHROPIC_API_KEY is not set` at runtime    | Secret not added in Space settings, or empty. Re-run the secrets loop above.  |
| 403 on `create_repo` mentioning ZeroGPU      | ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead.     |
| Qwen replies very slowly (30-60s)            | Expected on `cpu-basic`. Switch to ZeroGPU per the section above.             |
| Tracing missing from Langfuse                | Network timeout on the Space → traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. |