pankaj
Clean repo for GitHub: drop unused fallbacks, add LICENSE, regenerate predictions
066d2f6
# Deploying to Hugging Face Spaces
The repo ships everything HF Spaces needs: a `Dockerfile`, `requirements.txt`,
a `README.md` with the required Space front-matter, and the scripts that
build the Chroma vector index at image-build time.
## Prerequisites
- Free HF account at https://huggingface.co/join
- An OpenAI API key (for the LLM rerank step)
- *Optional*: Langfuse keys for tracing
---
## Step-by-step
### 1. Create the Space
1. Go to https://huggingface.co/new-space
2. **Name**: e.g. `shl-recommender-api`
3. **Space SDK**: select **Docker** (NOT Gradio / Streamlit)
4. **Hardware**: Free CPU basic (16 GB RAM, plenty for `bge-large`)
5. **Visibility**: Public
6. Click **Create Space**
### 2. Push the code
Spaces are git repos. Add it as a remote and push:
```bash
cd /path/to/shl-asss
git init
git add .
git commit -m "SHL recommender β€” initial commit"
# HF requires a Personal Access Token with WRITE scope.
# Create one at https://huggingface.co/settings/tokens
# Then use it as the password when prompted by git push.
git remote add space https://huggingface.co/spaces/<USERNAME>/shl-recommender-api
git branch -M main
git push -u space main
# Username: <USERNAME>
# Password: paste the hf_... token
```
The Space picks up:
- `Dockerfile` β†’ builds the container
- `README.md` front-matter β†’ configures the Space (title, port, etc.)
### 3. Set the environment
Open your Space β†’ **Settings** β†’ **Variables and secrets**.
| Type | Name | Value |
|---|---|---|
| Variable | `LLM_PROVIDER` | `openai` |
| Variable | `LLM_MODEL` | `gpt-5-mini` |
| Secret | `OPENAI_API_KEY` | your `sk-proj-...` |
| Secret (optional) | `LANGFUSE_PUBLIC_KEY` | `pk-lf-...` |
| Secret (optional) | `LANGFUSE_SECRET_KEY` | `sk-lf-...` |
| Secret (optional) | `LANGFUSE_BASE_URL` | `https://us.cloud.langfuse.com` |
Each variable change triggers a rebuild β€” it's smart to set them all at
once before the first push, or batch later changes.
### 4. Wait for the build
First build downloads:
- ~600 MB of pip dependencies
- ~1.3 GB of `bge-large-en-v1.5` weights
- Embeds 377 documents into a fresh `data/chroma/` (the index is built
during `RUN python -m scripts.index` β€” no binary blobs in git)
**Expect 5–8 minutes** for the first build. The Space dashboard streams
logs in real time. Re-runs hit pip's cache and finish in ~2–3 min.
### 5. Verify
Your Space exposes an HTTPS URL like
`https://<USERNAME>-shl-recommender-api.hf.space`.
```bash
curl https://<USERNAME>-shl-recommender-api.hf.space/health
# {"status":"healthy"}
curl -X POST https://<USERNAME>-shl-recommender-api.hf.space/recommend \
-H "Content-Type: application/json" \
-d '{"query":"hire java developers under 40 minutes"}'
```
Or open the auto-generated Swagger UI in a browser:
```
https://<USERNAME>-shl-recommender-api.hf.space/docs
```
Spaces stay warm; cold-start is rare. Each `/recommend` call takes ~2 s
(LLM rerank dominates).
---
## Configuration knobs
All env vars; set in the Space's Settings β†’ Variables and secrets.
| Env var | Default | Notes |
|---|---|---|
| `EMBED_PROVIDER` | `local` | `local` (sentence-transformers) or `gemini` |
| `EMBED_MODEL` | `BAAI/bge-large-en-v1.5` | Pin smaller for tight RAM hosts |
| `LLM_PROVIDER` | `gemini` *(set to `openai` in Space)* | `openai` or `gemini` |
| `LLM_MODEL` | varies by provider | e.g. `gpt-5-mini`, `gpt-4o-mini`, `gemini-2.5-flash` |
| `OPENAI_BASE_URL` | unset | Set for Azure / OpenRouter / proxy |
---
## Memory profile (free tier sanity check)
| Component | RAM at idle |
|---|---|
| Python interpreter + libraries | ~200 MB |
| `bge-large-en-v1.5` weights | ~1.3 GB |
| Chroma + BM25 index | ~30 MB |
| FastAPI / uvicorn | ~50 MB |
| **Total at runtime** | **~1.6 GB** |
| HF Spaces free tier | 16 GB βœ“ |
---
## Updating the deployment
After any local change, just push to the connected branch:
```bash
git add ...
git commit -m "..."
git push space main
```
The Space auto-detects the push and redeploys.
If `data/documents.jsonl` changes (re-scrape or re-extract concepts), the
Chroma index gets rebuilt during the next image build automatically β€” no
manual step.
---
## Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| `500 retrieval failed: GEMINI_API_KEY not set` | `LLM_PROVIDER` not set, code defaults to Gemini | Add `LLM_PROVIDER=openai` Variable |
| `500 OPENAI_API_KEY not set` | Forgot the secret | Add `OPENAI_API_KEY` Secret |
| Build hangs on `RUN python -m scripts.index` for >10 min | Embedding loop is genuinely slow on free CPU; tqdm doesn't flush | Wait it out. Look for `collection 'shl_baseline' has 377 items` to confirm completion. |
| Push rejected: `binary files` | Chroma binaries in git | They shouldn't be β€” `.gitignore` excludes `data/chroma/`. If anything else binary slipped in, remove with `git rm --cached <file>` |
| Push rejected: `valid Hugging Face secrets` | Token was committed somewhere | Search the repo: `grep -rn 'hf_' .` then strip and amend |