text-embding-model / README.md
ibrahimdaud's picture
feat: FastAPI embedding service for eduai_platform
fbbd988
metadata
title: Text Embding Model
emoji: 🏒
colorFrom: pink
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: 'This is the Emebding model for the demo application '

eduai-embedder (text-embding-model Space)

Tiny FastAPI service that wraps sentence-transformers/all-MiniLM-L6-v2 (384-dim, free, CPU) behind three HTTP endpoints. Deployed on this HuggingFace Docker Space so the eduai_platform team doesn't have to install torch locally.

Why this exists

Installing torch + sentence-transformers reliably on Windows + Conda is a daily-blocker. By moving embeddings into a single shared service:

  • New contributors clone the platform repo with no ML deps.
  • The model is loaded once, in one place, by one container.
  • We can swap to a stronger model (or hosted provider) without touching any client code.

API

Method Path Auth Body Response
GET / open β€” {status, model, dim}
GET /health open β€” {status, model, dim}
POST /embed X-API-Key {texts: [str]} {embeddings: [[float]], model, dim}
POST /embed_one X-API-Key {text: str} {embedding: [float], model, dim}

Vectors are L2-normalized so cosine similarity is just a dot product.

Example

Once the Space is live at https://ibrahimdaud-text-embding-model.hf.space:

curl https://ibrahimdaud-text-embding-model.hf.space/health
# {"status":"ok","model":"all-MiniLM-L6-v2","dim":384}

curl -X POST https://ibrahimdaud-text-embding-model.hf.space/embed \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EMBEDDER_API_KEY" \
  -d '{"texts": ["What is a quadratic?", "Define discriminant."]}' | jq .model
# "all-MiniLM-L6-v2"

Local development

python -m venv .venv
source .venv/bin/activate         # Linux / macOS
# .venv\Scripts\activate          # Windows
pip install -r requirements.txt
cp .env.example .env              # then set EMBEDDER_API_KEY

uvicorn app:app --reload --port 7860
# http://127.0.0.1:7860/health
# http://127.0.0.1:7860/docs       (Swagger UI)

Docker (mirrors what HF Spaces does)

docker build -t eduai-embedder .
docker run --rm -p 7860:7860 \
  -e EMBEDDER_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')" \
  eduai-embedder

Configuring the Space

  1. Add the secret. Space β†’ Settings β†’ Variables and secrets β†’ New secret β†’ name EMBEDDER_API_KEY, value = a 32-char URL-safe token:

    python -c "import secrets; print(secrets.token_urlsafe(32))"
    

    Save the same value into every team member's eduai_platform/.env as EMBEDDING_API_KEY.

  2. Push from this folder:

    git add .
    git commit -m "deploy embedding service"
    git push origin main
    

    First push: ~5 min (Docker build + model download). Subsequent pushes only rebuild if requirements.txt or Dockerfile change.

  3. Watch the build. Space dashboard β†’ Logs tab. You should see:

    eduai-embedder INFO Loading sentence-transformers model: all-MiniLM-L6-v2 ...
    eduai-embedder INFO Model loaded (dim=384, ...)
    INFO     Application startup complete.
    
  4. Wire it into eduai_platform. Add to eduai_platform/.env:

    EMBEDDING_PROVIDER=remote
    EMBEDDING_API_URL=https://ibrahimdaud-text-embding-model.hf.space
    EMBEDDING_API_KEY=<same value as Space secret>
    

Operations

  • Cold starts. HuggingFace Spaces puts free CPU instances to sleep after inactivity. First request after sleep takes ~30 s. The chat UI's loading indicator covers this; we may add a weekly GitHub Actions cron pinging /health to keep it warm.
  • Rotating the API key. Bump the secret in Space settings, then update every team .env. No code change. Old key is invalidated immediately.
  • Switching the model. Set EMBEDDER_MODEL_NAME (Space secret or Dockerfile ARG MODEL) and redeploy. Important: if dim changes (e.g. switching to a 768-dim model), every existing embedding in the vector store must be regenerated.

Limits

The service rejects:

  • batches with more than EMBEDDER_MAX_BATCH (default 128) texts β†’ 400
  • any text longer than EMBEDDER_MAX_TEXT_LEN (default 8000) chars β†’ 400
  • requests without a valid X-API-Key when one is configured β†’ 401

License

Apache 2.0 (matches the Space metadata above and the model license).