File size: 5,014 Bytes
066d2f6
870800f
066d2f6
 
 
870800f
066d2f6
870800f
 
 
 
 
066d2f6
 
 
870800f
066d2f6
870800f
 
 
066d2f6
 
 
 
870800f
066d2f6
870800f
 
 
 
066d2f6
870800f
 
 
 
 
066d2f6
 
 
 
 
 
870800f
066d2f6
 
870800f
 
 
 
 
 
066d2f6
870800f
066d2f6
870800f
066d2f6
 
 
 
 
 
 
 
870800f
066d2f6
 
870800f
066d2f6
870800f
 
 
066d2f6
 
 
870800f
066d2f6
 
870800f
066d2f6
870800f
066d2f6
870800f
 
 
 
 
 
 
 
 
 
 
066d2f6
870800f
066d2f6
 
 
870800f
066d2f6
 
870800f
 
 
066d2f6
870800f
066d2f6
870800f
 
 
 
066d2f6
 
 
870800f
 
 
 
066d2f6
870800f
066d2f6
 
 
 
 
 
 
 
870800f
 
 
 
 
 
 
 
066d2f6
 
 
870800f
 
066d2f6
870800f
066d2f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Deploying to Hugging Face Spaces

The repo ships everything HF Spaces needs: a `Dockerfile`, `requirements.txt`,
a `README.md` with the required Space front-matter, and the scripts that
build the Chroma vector index at image-build time.

## Prerequisites

- Free HF account at https://huggingface.co/join
- An OpenAI API key (for the LLM rerank step)
- *Optional*: Langfuse keys for tracing

---

## Step-by-step

### 1. Create the Space

1. Go to https://huggingface.co/new-space
2. **Name**: e.g. `shl-recommender-api`
3. **Space SDK**: select **Docker** (NOT Gradio / Streamlit)
4. **Hardware**: Free CPU basic (16 GB RAM, plenty for `bge-large`)
5. **Visibility**: Public
6. Click **Create Space**

### 2. Push the code

Spaces are git repos. Add it as a remote and push:

```bash
cd /path/to/shl-asss

git init
git add .
git commit -m "SHL recommender β€” initial commit"

# HF requires a Personal Access Token with WRITE scope.
# Create one at https://huggingface.co/settings/tokens
# Then use it as the password when prompted by git push.

git remote add space https://huggingface.co/spaces/<USERNAME>/shl-recommender-api
git branch -M main
git push -u space main
# Username: <USERNAME>
# Password: paste the hf_... token
```

The Space picks up:
- `Dockerfile` β†’ builds the container
- `README.md` front-matter β†’ configures the Space (title, port, etc.)

### 3. Set the environment

Open your Space β†’ **Settings** β†’ **Variables and secrets**.

| Type | Name | Value |
|---|---|---|
| Variable | `LLM_PROVIDER` | `openai` |
| Variable | `LLM_MODEL` | `gpt-5-mini` |
| Secret | `OPENAI_API_KEY` | your `sk-proj-...` |
| Secret (optional) | `LANGFUSE_PUBLIC_KEY` | `pk-lf-...` |
| Secret (optional) | `LANGFUSE_SECRET_KEY` | `sk-lf-...` |
| Secret (optional) | `LANGFUSE_BASE_URL` | `https://us.cloud.langfuse.com` |

Each variable change triggers a rebuild β€” it's smart to set them all at
once before the first push, or batch later changes.

### 4. Wait for the build

First build downloads:
- ~600 MB of pip dependencies
- ~1.3 GB of `bge-large-en-v1.5` weights
- Embeds 377 documents into a fresh `data/chroma/` (the index is built
  during `RUN python -m scripts.index` β€” no binary blobs in git)

**Expect 5–8 minutes** for the first build. The Space dashboard streams
logs in real time. Re-runs hit pip's cache and finish in ~2–3 min.

### 5. Verify

Your Space exposes an HTTPS URL like
`https://<USERNAME>-shl-recommender-api.hf.space`.

```bash
curl https://<USERNAME>-shl-recommender-api.hf.space/health
# {"status":"healthy"}

curl -X POST https://<USERNAME>-shl-recommender-api.hf.space/recommend \
  -H "Content-Type: application/json" \
  -d '{"query":"hire java developers under 40 minutes"}'
```

Or open the auto-generated Swagger UI in a browser:

```
https://<USERNAME>-shl-recommender-api.hf.space/docs
```

Spaces stay warm; cold-start is rare. Each `/recommend` call takes ~2 s
(LLM rerank dominates).

---

## Configuration knobs

All env vars; set in the Space's Settings β†’ Variables and secrets.

| Env var | Default | Notes |
|---|---|---|
| `EMBED_PROVIDER` | `local` | `local` (sentence-transformers) or `gemini` |
| `EMBED_MODEL` | `BAAI/bge-large-en-v1.5` | Pin smaller for tight RAM hosts |
| `LLM_PROVIDER` | `gemini` *(set to `openai` in Space)* | `openai` or `gemini` |
| `LLM_MODEL` | varies by provider | e.g. `gpt-5-mini`, `gpt-4o-mini`, `gemini-2.5-flash` |
| `OPENAI_BASE_URL` | unset | Set for Azure / OpenRouter / proxy |

---

## Memory profile (free tier sanity check)

| Component | RAM at idle |
|---|---|
| Python interpreter + libraries | ~200 MB |
| `bge-large-en-v1.5` weights | ~1.3 GB |
| Chroma + BM25 index | ~30 MB |
| FastAPI / uvicorn | ~50 MB |
| **Total at runtime** | **~1.6 GB** |
| HF Spaces free tier | 16 GB βœ“ |

---

## Updating the deployment

After any local change, just push to the connected branch:

```bash
git add ...
git commit -m "..."
git push space main
```

The Space auto-detects the push and redeploys.

If `data/documents.jsonl` changes (re-scrape or re-extract concepts), the
Chroma index gets rebuilt during the next image build automatically β€” no
manual step.

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `500 retrieval failed: GEMINI_API_KEY not set` | `LLM_PROVIDER` not set, code defaults to Gemini | Add `LLM_PROVIDER=openai` Variable |
| `500 OPENAI_API_KEY not set` | Forgot the secret | Add `OPENAI_API_KEY` Secret |
| Build hangs on `RUN python -m scripts.index` for >10 min | Embedding loop is genuinely slow on free CPU; tqdm doesn't flush | Wait it out. Look for `collection 'shl_baseline' has 377 items` to confirm completion. |
| Push rejected: `binary files` | Chroma binaries in git | They shouldn't be β€” `.gitignore` excludes `data/chroma/`. If anything else binary slipped in, remove with `git rm --cached <file>` |
| Push rejected: `valid Hugging Face secrets` | Token was committed somewhere | Search the repo: `grep -rn 'hf_' .` then strip and amend |