Spaces:

ibrahimdaud
/

text-embding-model

Sleeping

App Files Files Community

text-embding-model / README.md

ibrahimdaud

feat: FastAPI embedding service for eduai_platform

fbbd988 19 days ago

preview code

raw

history blame contribute delete

4.51 kB

metadata

title: Text Embding Model
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: 'This is the Emebding model for the demo application '

eduai-embedder (text-embding-model Space)

Tiny FastAPI service that wraps sentence-transformers/all-MiniLM-L6-v2 (384-dim, free, CPU) behind three HTTP endpoints. Deployed on this HuggingFace Docker Space so the eduai_platform team doesn't have to install torch locally.

Why this exists

Installing torch + sentence-transformers reliably on Windows + Conda is a daily-blocker. By moving embeddings into a single shared service:

New contributors clone the platform repo with no ML deps.
The model is loaded once, in one place, by one container.
We can swap to a stronger model (or hosted provider) without touching any client code.

API

Method	Path	Auth	Body	Response
`GET`	`/`	open	—	`{status, model, dim}`
`GET`	`/health`	open	—	`{status, model, dim}`
`POST`	`/embed`	`X-API-Key`	`{texts: [str]}`	`{embeddings: [[float]], model, dim}`
`POST`	`/embed_one`	`X-API-Key`	`{text: str}`	`{embedding: [float], model, dim}`

Vectors are L2-normalized so cosine similarity is just a dot product.

Example

Once the Space is live at https://ibrahimdaud-text-embding-model.hf.space:

curl https://ibrahimdaud-text-embding-model.hf.space/health
# {"status":"ok","model":"all-MiniLM-L6-v2","dim":384}

curl -X POST https://ibrahimdaud-text-embding-model.hf.space/embed \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EMBEDDER_API_KEY" \
  -d '{"texts": ["What is a quadratic?", "Define discriminant."]}' | jq .model
# "all-MiniLM-L6-v2"

Local development

python -m venv .venv
source .venv/bin/activate         # Linux / macOS
# .venv\Scripts\activate          # Windows
pip install -r requirements.txt
cp .env.example .env              # then set EMBEDDER_API_KEY

uvicorn app:app --reload --port 7860
# http://127.0.0.1:7860/health
# http://127.0.0.1:7860/docs       (Swagger UI)

Docker (mirrors what HF Spaces does)

docker build -t eduai-embedder .
docker run --rm -p 7860:7860 \
  -e EMBEDDER_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')" \
  eduai-embedder

Configuring the Space

Add the secret. Space → Settings → Variables and secrets → New secret → name EMBEDDER_API_KEY, value = a 32-char URL-safe token:
```
python -c "import secrets; print(secrets.token_urlsafe(32))"
```
Save the same value into every team member's eduai_platform/.env as EMBEDDING_API_KEY.
Push from this folder:
```
git add .
git commit -m "deploy embedding service"
git push origin main
```
First push: ~5 min (Docker build + model download). Subsequent pushes only rebuild if requirements.txt or Dockerfile change.

Watch the build. Space dashboard → Logs tab. You should see:

eduai-embedder INFO Loading sentence-transformers model: all-MiniLM-L6-v2 ...
eduai-embedder INFO Model loaded (dim=384, ...)
INFO     Application startup complete.

Wire it into eduai_platform. Add to eduai_platform/.env:

EMBEDDING_PROVIDER=remote
EMBEDDING_API_URL=https://ibrahimdaud-text-embding-model.hf.space
EMBEDDING_API_KEY=<same value as Space secret>

Operations

Cold starts. HuggingFace Spaces puts free CPU instances to sleep after inactivity. First request after sleep takes ~30 s. The chat UI's loading indicator covers this; we may add a weekly GitHub Actions cron pinging /health to keep it warm.
Rotating the API key. Bump the secret in Space settings, then update every team .env. No code change. Old key is invalidated immediately.
Switching the model. Set EMBEDDER_MODEL_NAME (Space secret or Dockerfile ARG MODEL) and redeploy. Important: if dim changes (e.g. switching to a 768-dim model), every existing embedding in the vector store must be regenerated.

Limits

The service rejects:

batches with more than EMBEDDER_MAX_BATCH (default 128) texts → 400
any text longer than EMBEDDER_MAX_TEXT_LEN (default 8000) chars → 400
requests without a valid X-API-Key when one is configured → 401

License

Apache 2.0 (matches the Space metadata above and the model license).