File size: 5,106 Bytes
35c0d38
 
3683c14
 
 
35c0d38
3683c14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# Deploy Guide

End-to-end deployment of this project to a Hugging Face Space. Covers both the
**free CPU** path (used for this deploy) and the **ZeroGPU** path (for when a
HF PRO subscription is available).

## Prerequisites

- A Hugging Face account with an access token (User Settings β†’ Access Tokens; needs **write** scope).
- The token in your local `.env` as `HF_TOKEN=hf_…`.
- The Python deps installed (`uv sync` in the project root).

## One-shot deploy via `huggingface_hub` Python API

Everything below β€” create the Space, set secrets, upload files β€” can be done in one Python script. This is what was actually run to produce the live demo.

```python
from huggingface_hub import HfApi
from src.config import settings

REPO_ID = "<your-username>/oss-vs-frontier-assistant"
api = HfApi(token=settings.hf_token)

# 1) Create the Space (idempotent; safe to re-run).
api.create_repo(
    repo_id=REPO_ID,
    repo_type="space",
    space_sdk="gradio",
    space_hardware="cpu-basic",     # or "zero-a10g" if you have HF PRO
    exist_ok=True,
    private=False,
)

# 2) Set every required secret. These show up as env vars in the Space runtime.
for k, v in {
    "ANTHROPIC_API_KEY":   settings.anthropic_api_key,
    "HF_TOKEN":            settings.hf_token,
    "TAVILY_API_KEY":      settings.tavily_api_key,
    "LANGFUSE_PUBLIC_KEY": settings.langfuse_public_key,
    "LANGFUSE_SECRET_KEY": settings.langfuse_secret_key,
    "LANGFUSE_HOST":       settings.langfuse_host,
}.items():
    if v:
        api.add_space_secret(repo_id=REPO_ID, key=k, value=v)

# 3) Upload the project, excluding local-only / sensitive files.
api.upload_folder(
    repo_id=REPO_ID,
    repo_type="space",
    folder_path=".",
    commit_message="deploy",
    ignore_patterns=[
        ".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
        "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
        "*.pyc", ".gitignore", "uv.lock",
    ],
)
```

Why this approach over the web UI:

- **No browser steps.** Reproducible from any machine with the token.
- **Secrets travel safely.** They never leave your machine in plaintext; the SDK posts them over HTTPS directly to the Space config.
- **Re-runnable.** `exist_ok=True` + `upload_folder` overwrite makes re-deploys trivial.

## What HF Spaces reads

| File         | Role on Spaces                                                                 |
|--------------|--------------------------------------------------------------------------------|
| `README.md`  | The YAML frontmatter at the top configures the Space (sdk, hardware, etc.).    |
| `requirements.txt` | Installed at build time. **Must be kept in sync with `pyproject.toml`.**  |
| `app.py`     | Entry point (`app_file: app.py` in the YAML); HF imports it and finds `demo`.  |

The YAML frontmatter currently used:

```yaml
---
title: OSS vs Frontier Assistant
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
python_version: "3.11"
app_file: app.py
hardware: cpu-basic
pinned: false
---
```

## Switching to ZeroGPU later

1. Subscribe to HF PRO ($9/mo).
2. In the Space's **Settings β†’ Hardware**, switch to `Nvidia A10G - Zero` (or rerun the deploy script with `space_hardware="zero-a10g"`).
3. Update the YAML in `README.md` to `hardware: zero-a10g`.
4. Re-upload: the `@spaces.GPU(duration=120)` decorator already on `QwenChatModel._generate` will start allocating real GPU time β€” Qwen latency drops from ~30-60s to ~3-8s per reply.

## Re-deploy after code changes

```bash
# Bump requirements.txt if pyproject.toml changed, then:
uv run python - <<'PY'
from huggingface_hub import HfApi
from src.config import settings
HfApi(token=settings.hf_token).upload_folder(
    repo_id="<your-username>/oss-vs-frontier-assistant",
    repo_type="space",
    folder_path=".",
    commit_message="update",
    ignore_patterns=[".env", ".git/**", ".venv/**", ".pytest_cache/**", ".claude/**",
                     "data/**", "results/**", "__pycache__/**", "**/__pycache__/**",
                     "*.pyc", ".gitignore", "uv.lock"],
)
PY
```

HF triggers a new build automatically when files change.

## Troubleshooting

| Symptom on Spaces                           | Likely cause / fix                                                            |
|---------------------------------------------|-------------------------------------------------------------------------------|
| Build fails on `torch`/`transformers` install | Mismatch between `requirements.txt` pin and HF base image β€” check `python_version`. |
| `ANTHROPIC_API_KEY is not set` at runtime    | Secret not added in Space settings, or empty. Re-run the secrets loop above.  |
| 403 on `create_repo` mentioning ZeroGPU      | ZeroGPU is gated behind HF PRO; use `space_hardware="cpu-basic"` instead.     |
| Qwen replies very slowly (30-60s)            | Expected on `cpu-basic`. Switch to ZeroGPU per the section above.             |
| Tracing missing from Langfuse                | Network timeout on the Space β†’ traces. Non-fatal; bump `LANGFUSE_TIMEOUT=30`. |