Spaces:

ibrahimdaud
/

text-embding-model

Sleeping

File size: 4,509 Bytes

---
title: Text Embding Model
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: 'This is the Emebding model for the demo application '
---

# eduai-embedder (text-embding-model Space)

Tiny FastAPI service that wraps `sentence-transformers/all-MiniLM-L6-v2`
(384-dim, free, CPU) behind three HTTP endpoints. Deployed on this
HuggingFace Docker Space so the [eduai_platform](https://github.com/)
team doesn't have to install `torch` locally.

## Why this exists

Installing `torch` + `sentence-transformers` reliably on Windows + Conda
is a daily-blocker. By moving embeddings into a single shared service:

- New contributors clone the platform repo with **no ML deps**.
- The model is loaded **once**, in one place, by one container.
- We can swap to a stronger model (or hosted provider) without touching
  any client code.

## API

| Method | Path | Auth | Body | Response |
|---|---|---|---|---|
| `GET` | `/` | open | — | `{status, model, dim}` |
| `GET` | `/health` | open | — | `{status, model, dim}` |
| `POST` | `/embed` | `X-API-Key` | `{texts: [str]}` | `{embeddings: [[float]], model, dim}` |
| `POST` | `/embed_one` | `X-API-Key` | `{text: str}` | `{embedding: [float], model, dim}` |

Vectors are L2-normalized so cosine similarity is just a dot product.

### Example

Once the Space is live at `https://ibrahimdaud-text-embding-model.hf.space`:

```bash
curl https://ibrahimdaud-text-embding-model.hf.space/health
# {"status":"ok","model":"all-MiniLM-L6-v2","dim":384}

curl -X POST https://ibrahimdaud-text-embding-model.hf.space/embed \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $EMBEDDER_API_KEY" \
  -d '{"texts": ["What is a quadratic?", "Define discriminant."]}' | jq .model
# "all-MiniLM-L6-v2"
```

## Local development

```bash
python -m venv .venv
source .venv/bin/activate         # Linux / macOS
# .venv\Scripts\activate          # Windows
pip install -r requirements.txt
cp .env.example .env              # then set EMBEDDER_API_KEY

uvicorn app:app --reload --port 7860
# http://127.0.0.1:7860/health
# http://127.0.0.1:7860/docs       (Swagger UI)
```

## Docker (mirrors what HF Spaces does)

```bash
docker build -t eduai-embedder .
docker run --rm -p 7860:7860 \
  -e EMBEDDER_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')" \
  eduai-embedder
```

## Configuring the Space

1. **Add the secret.** Space → Settings → Variables and secrets →
   *New secret* → name `EMBEDDER_API_KEY`, value = a 32-char URL-safe token:
   ```bash
   python -c "import secrets; print(secrets.token_urlsafe(32))"
   ```
   Save the same value into every team member's `eduai_platform/.env` as
   `EMBEDDING_API_KEY`.

2. **Push from this folder:**
   ```bash
   git add .
   git commit -m "deploy embedding service"
   git push origin main
   ```
   First push: ~5 min (Docker build + model download). Subsequent pushes
   only rebuild if `requirements.txt` or `Dockerfile` change.

3. **Watch the build.** Space dashboard → Logs tab. You should see:
   ```
   eduai-embedder INFO Loading sentence-transformers model: all-MiniLM-L6-v2 ...
   eduai-embedder INFO Model loaded (dim=384, ...)
   INFO     Application startup complete.
   ```

4. **Wire it into eduai_platform.** Add to `eduai_platform/.env`:
   ```
   EMBEDDING_PROVIDER=remote
   EMBEDDING_API_URL=https://ibrahimdaud-text-embding-model.hf.space
   EMBEDDING_API_KEY=<same value as Space secret>
   ```

## Operations

- **Cold starts.** HuggingFace Spaces puts free CPU instances to sleep
  after inactivity. First request after sleep takes ~30 s. The chat UI's
  loading indicator covers this; we may add a weekly GitHub Actions
  cron pinging `/health` to keep it warm.
- **Rotating the API key.** Bump the secret in Space settings, then update
  every team `.env`. No code change. Old key is invalidated immediately.
- **Switching the model.** Set `EMBEDDER_MODEL_NAME` (Space secret or
  Dockerfile `ARG MODEL`) and redeploy. **Important:** if `dim` changes
  (e.g. switching to a 768-dim model), every existing embedding in the
  vector store must be regenerated.

## Limits

The service rejects:
- batches with more than `EMBEDDER_MAX_BATCH` (default 128) texts → 400
- any text longer than `EMBEDDER_MAX_TEXT_LEN` (default 8000) chars → 400
- requests without a valid `X-API-Key` when one is configured → 401

## License

Apache 2.0 (matches the Space metadata above and the model license).