Spaces:

kshitij8076
/

shl-recommender-api

Sleeping

App Files Files Community

shl-recommender-api / docs /deploy.md

pankaj

Clean repo for GitHub: drop unused fallbacks, add LICENSE, regenerate predictions

066d2f6 28 days ago

preview code

raw

history blame contribute delete

5.01 kB

	# Deploying to Hugging Face Spaces

	The repo ships everything HF Spaces needs: a `Dockerfile`, `requirements.txt`,
	a `README.md` with the required Space front-matter, and the scripts that
	build the Chroma vector index at image-build time.

	## Prerequisites

	- Free HF account at https://huggingface.co/join
	- An OpenAI API key (for the LLM rerank step)
	- Optional: Langfuse keys for tracing

	---

	## Step-by-step

	### 1. Create the Space

	1. Go to https://huggingface.co/new-space
	2. Name: e.g. `shl-recommender-api`
	3. Space SDK: select Docker (NOT Gradio / Streamlit)
	4. Hardware: Free CPU basic (16 GB RAM, plenty for `bge-large`)
	5. Visibility: Public
	6. Click Create Space

	### 2. Push the code

	Spaces are git repos. Add it as a remote and push:

	```bash
	cd /path/to/shl-asss

	git init
	git add .
	git commit -m "SHL recommender — initial commit"

	# HF requires a Personal Access Token with WRITE scope.
	# Create one at https://huggingface.co/settings/tokens
	# Then use it as the password when prompted by git push.

	git remote add space https://huggingface.co/spaces/<USERNAME>/shl-recommender-api
	git branch -M main
	git push -u space main
	# Username: <USERNAME>
	# Password: paste the hf_... token
	```

	The Space picks up:
	- `Dockerfile` → builds the container
	- `README.md` front-matter → configures the Space (title, port, etc.)

	### 3. Set the environment

	Open your Space → Settings → Variables and secrets.

	\| Type \| Name \| Value \|
	\|---\|---\|---\|
	\| Variable \| `LLM_PROVIDER` \| `openai` \|
	\| Variable \| `LLM_MODEL` \| `gpt-5-mini` \|
	\| Secret \| `OPENAI_API_KEY` \| your `sk-proj-...` \|
	\| Secret (optional) \| `LANGFUSE_PUBLIC_KEY` \| `pk-lf-...` \|
	\| Secret (optional) \| `LANGFUSE_SECRET_KEY` \| `sk-lf-...` \|
	\| Secret (optional) \| `LANGFUSE_BASE_URL` \| `https://us.cloud.langfuse.com` \|

	Each variable change triggers a rebuild — it's smart to set them all at
	once before the first push, or batch later changes.

	### 4. Wait for the build

	First build downloads:
	- ~600 MB of pip dependencies
	- ~1.3 GB of `bge-large-en-v1.5` weights
	- Embeds 377 documents into a fresh `data/chroma/` (the index is built
	during `RUN python -m scripts.index` — no binary blobs in git)

	Expect 5–8 minutes for the first build. The Space dashboard streams
	logs in real time. Re-runs hit pip's cache and finish in ~2–3 min.

	### 5. Verify

	Your Space exposes an HTTPS URL like
	`https://<USERNAME>-shl-recommender-api.hf.space`.

	```bash
	curl https://<USERNAME>-shl-recommender-api.hf.space/health
	# {"status":"healthy"}

	curl -X POST https://<USERNAME>-shl-recommender-api.hf.space/recommend \
	-H "Content-Type: application/json" \
	-d '{"query":"hire java developers under 40 minutes"}'
	```

	Or open the auto-generated Swagger UI in a browser:

	```
	https://<USERNAME>-shl-recommender-api.hf.space/docs
	```

	Spaces stay warm; cold-start is rare. Each `/recommend` call takes ~2 s
	(LLM rerank dominates).

	---

	## Configuration knobs

	All env vars; set in the Space's Settings → Variables and secrets.

	\| Env var \| Default \| Notes \|
	\|---\|---\|---\|
	\| `EMBED_PROVIDER` \| `local` \| `local` (sentence-transformers) or `gemini` \|
	\| `EMBED_MODEL` \| `BAAI/bge-large-en-v1.5` \| Pin smaller for tight RAM hosts \|
	\| `LLM_PROVIDER` \| `gemini` (set to `openai` in Space) \| `openai` or `gemini` \|
	\| `LLM_MODEL` \| varies by provider \| e.g. `gpt-5-mini`, `gpt-4o-mini`, `gemini-2.5-flash` \|
	\| `OPENAI_BASE_URL` \| unset \| Set for Azure / OpenRouter / proxy \|

	---

	## Memory profile (free tier sanity check)

	\| Component \| RAM at idle \|
	\|---\|---\|
	\| Python interpreter + libraries \| ~200 MB \|
	\| `bge-large-en-v1.5` weights \| ~1.3 GB \|
	\| Chroma + BM25 index \| ~30 MB \|
	\| FastAPI / uvicorn \| ~50 MB \|
	\| Total at runtime \| ~1.6 GB \|
	\| HF Spaces free tier \| 16 GB ✓ \|

	---

	## Updating the deployment

	After any local change, just push to the connected branch:

	```bash
	git add ...
	git commit -m "..."
	git push space main
	```

	The Space auto-detects the push and redeploys.

	If `data/documents.jsonl` changes (re-scrape or re-extract concepts), the
	Chroma index gets rebuilt during the next image build automatically — no
	manual step.

	---

	## Troubleshooting

	\| Symptom \| Likely cause \| Fix \|
	\|---\|---\|---\|
	\| `500 retrieval failed: GEMINI_API_KEY not set` \| `LLM_PROVIDER` not set, code defaults to Gemini \| Add `LLM_PROVIDER=openai` Variable \|
	\| `500 OPENAI_API_KEY not set` \| Forgot the secret \| Add `OPENAI_API_KEY` Secret \|
	\| Build hangs on `RUN python -m scripts.index` for >10 min \| Embedding loop is genuinely slow on free CPU; tqdm doesn't flush \| Wait it out. Look for `collection 'shl_baseline' has 377 items` to confirm completion. \|
	\| Push rejected: `binary files` \| Chroma binaries in git \| They shouldn't be — `.gitignore` excludes `data/chroma/`. If anything else binary slipped in, remove with `git rm --cached <file>` \|
	\| Push rejected: `valid Hugging Face secrets` \| Token was committed somewhere \| Search the repo: `grep -rn 'hf_' .` then strip and amend \|