text-embding-model / README.md
ibrahimdaud's picture
feat: FastAPI embedding service for eduai_platform
fbbd988
---
title: Text Embding Model
emoji: 🏒
colorFrom: pink
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: 'This is the Emebding model for the demo application '
---
# eduai-embedder (text-embding-model Space)
Tiny FastAPI service that wraps `sentence-transformers/all-MiniLM-L6-v2`
(384-dim, free, CPU) behind three HTTP endpoints. Deployed on this
HuggingFace Docker Space so the [eduai_platform](https://github.com/)
team doesn't have to install `torch` locally.
## Why this exists
Installing `torch` + `sentence-transformers` reliably on Windows + Conda
is a daily-blocker. By moving embeddings into a single shared service:
- New contributors clone the platform repo with **no ML deps**.
- The model is loaded **once**, in one place, by one container.
- We can swap to a stronger model (or hosted provider) without touching
any client code.
## API
| Method | Path | Auth | Body | Response |
|---|---|---|---|---|
| `GET` | `/` | open | β€” | `{status, model, dim}` |
| `GET` | `/health` | open | β€” | `{status, model, dim}` |
| `POST` | `/embed` | `X-API-Key` | `{texts: [str]}` | `{embeddings: [[float]], model, dim}` |
| `POST` | `/embed_one` | `X-API-Key` | `{text: str}` | `{embedding: [float], model, dim}` |
Vectors are L2-normalized so cosine similarity is just a dot product.
### Example
Once the Space is live at `https://ibrahimdaud-text-embding-model.hf.space`:
```bash
curl https://ibrahimdaud-text-embding-model.hf.space/health
# {"status":"ok","model":"all-MiniLM-L6-v2","dim":384}
curl -X POST https://ibrahimdaud-text-embding-model.hf.space/embed \
-H "Content-Type: application/json" \
-H "X-API-Key: $EMBEDDER_API_KEY" \
-d '{"texts": ["What is a quadratic?", "Define discriminant."]}' | jq .model
# "all-MiniLM-L6-v2"
```
## Local development
```bash
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows
pip install -r requirements.txt
cp .env.example .env # then set EMBEDDER_API_KEY
uvicorn app:app --reload --port 7860
# http://127.0.0.1:7860/health
# http://127.0.0.1:7860/docs (Swagger UI)
```
## Docker (mirrors what HF Spaces does)
```bash
docker build -t eduai-embedder .
docker run --rm -p 7860:7860 \
-e EMBEDDER_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')" \
eduai-embedder
```
## Configuring the Space
1. **Add the secret.** Space β†’ Settings β†’ Variables and secrets β†’
*New secret* β†’ name `EMBEDDER_API_KEY`, value = a 32-char URL-safe token:
```bash
python -c "import secrets; print(secrets.token_urlsafe(32))"
```
Save the same value into every team member's `eduai_platform/.env` as
`EMBEDDING_API_KEY`.
2. **Push from this folder:**
```bash
git add .
git commit -m "deploy embedding service"
git push origin main
```
First push: ~5 min (Docker build + model download). Subsequent pushes
only rebuild if `requirements.txt` or `Dockerfile` change.
3. **Watch the build.** Space dashboard β†’ Logs tab. You should see:
```
eduai-embedder INFO Loading sentence-transformers model: all-MiniLM-L6-v2 ...
eduai-embedder INFO Model loaded (dim=384, ...)
INFO Application startup complete.
```
4. **Wire it into eduai_platform.** Add to `eduai_platform/.env`:
```
EMBEDDING_PROVIDER=remote
EMBEDDING_API_URL=https://ibrahimdaud-text-embding-model.hf.space
EMBEDDING_API_KEY=<same value as Space secret>
```
## Operations
- **Cold starts.** HuggingFace Spaces puts free CPU instances to sleep
after inactivity. First request after sleep takes ~30 s. The chat UI's
loading indicator covers this; we may add a weekly GitHub Actions
cron pinging `/health` to keep it warm.
- **Rotating the API key.** Bump the secret in Space settings, then update
every team `.env`. No code change. Old key is invalidated immediately.
- **Switching the model.** Set `EMBEDDER_MODEL_NAME` (Space secret or
Dockerfile `ARG MODEL`) and redeploy. **Important:** if `dim` changes
(e.g. switching to a 768-dim model), every existing embedding in the
vector store must be regenerated.
## Limits
The service rejects:
- batches with more than `EMBEDDER_MAX_BATCH` (default 128) texts β†’ 400
- any text longer than `EMBEDDER_MAX_TEXT_LEN` (default 8000) chars β†’ 400
- requests without a valid `X-API-Key` when one is configured β†’ 401
## License
Apache 2.0 (matches the Space metadata above and the model license).