# Deploying to Hugging Face Spaces The repo ships everything HF Spaces needs: a `Dockerfile`, `requirements.txt`, a `README.md` with the required Space front-matter, and the scripts that build the Chroma vector index at image-build time. ## Prerequisites - Free HF account at https://huggingface.co/join - An OpenAI API key (for the LLM rerank step) - *Optional*: Langfuse keys for tracing --- ## Step-by-step ### 1. Create the Space 1. Go to https://huggingface.co/new-space 2. **Name**: e.g. `shl-recommender-api` 3. **Space SDK**: select **Docker** (NOT Gradio / Streamlit) 4. **Hardware**: Free CPU basic (16 GB RAM, plenty for `bge-large`) 5. **Visibility**: Public 6. Click **Create Space** ### 2. Push the code Spaces are git repos. Add it as a remote and push: ```bash cd /path/to/shl-asss git init git add . git commit -m "SHL recommender — initial commit" # HF requires a Personal Access Token with WRITE scope. # Create one at https://huggingface.co/settings/tokens # Then use it as the password when prompted by git push. git remote add space https://huggingface.co/spaces//shl-recommender-api git branch -M main git push -u space main # Username: # Password: paste the hf_... token ``` The Space picks up: - `Dockerfile` → builds the container - `README.md` front-matter → configures the Space (title, port, etc.) ### 3. Set the environment Open your Space → **Settings** → **Variables and secrets**. | Type | Name | Value | |---|---|---| | Variable | `LLM_PROVIDER` | `openai` | | Variable | `LLM_MODEL` | `gpt-5-mini` | | Secret | `OPENAI_API_KEY` | your `sk-proj-...` | | Secret (optional) | `LANGFUSE_PUBLIC_KEY` | `pk-lf-...` | | Secret (optional) | `LANGFUSE_SECRET_KEY` | `sk-lf-...` | | Secret (optional) | `LANGFUSE_BASE_URL` | `https://us.cloud.langfuse.com` | Each variable change triggers a rebuild — it's smart to set them all at once before the first push, or batch later changes. ### 4. Wait for the build First build downloads: - ~600 MB of pip dependencies - ~1.3 GB of `bge-large-en-v1.5` weights - Embeds 377 documents into a fresh `data/chroma/` (the index is built during `RUN python -m scripts.index` — no binary blobs in git) **Expect 5–8 minutes** for the first build. The Space dashboard streams logs in real time. Re-runs hit pip's cache and finish in ~2–3 min. ### 5. Verify Your Space exposes an HTTPS URL like `https://-shl-recommender-api.hf.space`. ```bash curl https://-shl-recommender-api.hf.space/health # {"status":"healthy"} curl -X POST https://-shl-recommender-api.hf.space/recommend \ -H "Content-Type: application/json" \ -d '{"query":"hire java developers under 40 minutes"}' ``` Or open the auto-generated Swagger UI in a browser: ``` https://-shl-recommender-api.hf.space/docs ``` Spaces stay warm; cold-start is rare. Each `/recommend` call takes ~2 s (LLM rerank dominates). --- ## Configuration knobs All env vars; set in the Space's Settings → Variables and secrets. | Env var | Default | Notes | |---|---|---| | `EMBED_PROVIDER` | `local` | `local` (sentence-transformers) or `gemini` | | `EMBED_MODEL` | `BAAI/bge-large-en-v1.5` | Pin smaller for tight RAM hosts | | `LLM_PROVIDER` | `gemini` *(set to `openai` in Space)* | `openai` or `gemini` | | `LLM_MODEL` | varies by provider | e.g. `gpt-5-mini`, `gpt-4o-mini`, `gemini-2.5-flash` | | `OPENAI_BASE_URL` | unset | Set for Azure / OpenRouter / proxy | --- ## Memory profile (free tier sanity check) | Component | RAM at idle | |---|---| | Python interpreter + libraries | ~200 MB | | `bge-large-en-v1.5` weights | ~1.3 GB | | Chroma + BM25 index | ~30 MB | | FastAPI / uvicorn | ~50 MB | | **Total at runtime** | **~1.6 GB** | | HF Spaces free tier | 16 GB ✓ | --- ## Updating the deployment After any local change, just push to the connected branch: ```bash git add ... git commit -m "..." git push space main ``` The Space auto-detects the push and redeploys. If `data/documents.jsonl` changes (re-scrape or re-extract concepts), the Chroma index gets rebuilt during the next image build automatically — no manual step. --- ## Troubleshooting | Symptom | Likely cause | Fix | |---|---|---| | `500 retrieval failed: GEMINI_API_KEY not set` | `LLM_PROVIDER` not set, code defaults to Gemini | Add `LLM_PROVIDER=openai` Variable | | `500 OPENAI_API_KEY not set` | Forgot the secret | Add `OPENAI_API_KEY` Secret | | Build hangs on `RUN python -m scripts.index` for >10 min | Embedding loop is genuinely slow on free CPU; tqdm doesn't flush | Wait it out. Look for `collection 'shl_baseline' has 377 items` to confirm completion. | | Push rejected: `binary files` | Chroma binaries in git | They shouldn't be — `.gitignore` excludes `data/chroma/`. If anything else binary slipped in, remove with `git rm --cached ` | | Push rejected: `valid Hugging Face secrets` | Token was committed somewhere | Search the repo: `grep -rn 'hf_' .` then strip and amend |