Spaces:
Sleeping
Sleeping
| # Deploying to Hugging Face Spaces | |
| The repo ships everything HF Spaces needs: a `Dockerfile`, `requirements.txt`, | |
| a `README.md` with the required Space front-matter, and the scripts that | |
| build the Chroma vector index at image-build time. | |
| ## Prerequisites | |
| - Free HF account at https://huggingface.co/join | |
| - An OpenAI API key (for the LLM rerank step) | |
| - *Optional*: Langfuse keys for tracing | |
| --- | |
| ## Step-by-step | |
| ### 1. Create the Space | |
| 1. Go to https://huggingface.co/new-space | |
| 2. **Name**: e.g. `shl-recommender-api` | |
| 3. **Space SDK**: select **Docker** (NOT Gradio / Streamlit) | |
| 4. **Hardware**: Free CPU basic (16 GB RAM, plenty for `bge-large`) | |
| 5. **Visibility**: Public | |
| 6. Click **Create Space** | |
| ### 2. Push the code | |
| Spaces are git repos. Add it as a remote and push: | |
| ```bash | |
| cd /path/to/shl-asss | |
| git init | |
| git add . | |
| git commit -m "SHL recommender β initial commit" | |
| # HF requires a Personal Access Token with WRITE scope. | |
| # Create one at https://huggingface.co/settings/tokens | |
| # Then use it as the password when prompted by git push. | |
| git remote add space https://huggingface.co/spaces/<USERNAME>/shl-recommender-api | |
| git branch -M main | |
| git push -u space main | |
| # Username: <USERNAME> | |
| # Password: paste the hf_... token | |
| ``` | |
| The Space picks up: | |
| - `Dockerfile` β builds the container | |
| - `README.md` front-matter β configures the Space (title, port, etc.) | |
| ### 3. Set the environment | |
| Open your Space β **Settings** β **Variables and secrets**. | |
| | Type | Name | Value | | |
| |---|---|---| | |
| | Variable | `LLM_PROVIDER` | `openai` | | |
| | Variable | `LLM_MODEL` | `gpt-5-mini` | | |
| | Secret | `OPENAI_API_KEY` | your `sk-proj-...` | | |
| | Secret (optional) | `LANGFUSE_PUBLIC_KEY` | `pk-lf-...` | | |
| | Secret (optional) | `LANGFUSE_SECRET_KEY` | `sk-lf-...` | | |
| | Secret (optional) | `LANGFUSE_BASE_URL` | `https://us.cloud.langfuse.com` | | |
| Each variable change triggers a rebuild β it's smart to set them all at | |
| once before the first push, or batch later changes. | |
| ### 4. Wait for the build | |
| First build downloads: | |
| - ~600 MB of pip dependencies | |
| - ~1.3 GB of `bge-large-en-v1.5` weights | |
| - Embeds 377 documents into a fresh `data/chroma/` (the index is built | |
| during `RUN python -m scripts.index` β no binary blobs in git) | |
| **Expect 5β8 minutes** for the first build. The Space dashboard streams | |
| logs in real time. Re-runs hit pip's cache and finish in ~2β3 min. | |
| ### 5. Verify | |
| Your Space exposes an HTTPS URL like | |
| `https://<USERNAME>-shl-recommender-api.hf.space`. | |
| ```bash | |
| curl https://<USERNAME>-shl-recommender-api.hf.space/health | |
| # {"status":"healthy"} | |
| curl -X POST https://<USERNAME>-shl-recommender-api.hf.space/recommend \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"query":"hire java developers under 40 minutes"}' | |
| ``` | |
| Or open the auto-generated Swagger UI in a browser: | |
| ``` | |
| https://<USERNAME>-shl-recommender-api.hf.space/docs | |
| ``` | |
| Spaces stay warm; cold-start is rare. Each `/recommend` call takes ~2 s | |
| (LLM rerank dominates). | |
| --- | |
| ## Configuration knobs | |
| All env vars; set in the Space's Settings β Variables and secrets. | |
| | Env var | Default | Notes | | |
| |---|---|---| | |
| | `EMBED_PROVIDER` | `local` | `local` (sentence-transformers) or `gemini` | | |
| | `EMBED_MODEL` | `BAAI/bge-large-en-v1.5` | Pin smaller for tight RAM hosts | | |
| | `LLM_PROVIDER` | `gemini` *(set to `openai` in Space)* | `openai` or `gemini` | | |
| | `LLM_MODEL` | varies by provider | e.g. `gpt-5-mini`, `gpt-4o-mini`, `gemini-2.5-flash` | | |
| | `OPENAI_BASE_URL` | unset | Set for Azure / OpenRouter / proxy | | |
| --- | |
| ## Memory profile (free tier sanity check) | |
| | Component | RAM at idle | | |
| |---|---| | |
| | Python interpreter + libraries | ~200 MB | | |
| | `bge-large-en-v1.5` weights | ~1.3 GB | | |
| | Chroma + BM25 index | ~30 MB | | |
| | FastAPI / uvicorn | ~50 MB | | |
| | **Total at runtime** | **~1.6 GB** | | |
| | HF Spaces free tier | 16 GB β | | |
| --- | |
| ## Updating the deployment | |
| After any local change, just push to the connected branch: | |
| ```bash | |
| git add ... | |
| git commit -m "..." | |
| git push space main | |
| ``` | |
| The Space auto-detects the push and redeploys. | |
| If `data/documents.jsonl` changes (re-scrape or re-extract concepts), the | |
| Chroma index gets rebuilt during the next image build automatically β no | |
| manual step. | |
| --- | |
| ## Troubleshooting | |
| | Symptom | Likely cause | Fix | | |
| |---|---|---| | |
| | `500 retrieval failed: GEMINI_API_KEY not set` | `LLM_PROVIDER` not set, code defaults to Gemini | Add `LLM_PROVIDER=openai` Variable | | |
| | `500 OPENAI_API_KEY not set` | Forgot the secret | Add `OPENAI_API_KEY` Secret | | |
| | Build hangs on `RUN python -m scripts.index` for >10 min | Embedding loop is genuinely slow on free CPU; tqdm doesn't flush | Wait it out. Look for `collection 'shl_baseline' has 377 items` to confirm completion. | | |
| | Push rejected: `binary files` | Chroma binaries in git | They shouldn't be β `.gitignore` excludes `data/chroma/`. If anything else binary slipped in, remove with `git rm --cached <file>` | | |
| | Push rejected: `valid Hugging Face secrets` | Token was committed somewhere | Search the repo: `grep -rn 'hf_' .` then strip and amend | | |