| --- |
| title: Embeddings sidecar |
| emoji: π |
| colorFrom: indigo |
| colorTo: purple |
| sdk: docker |
| app_port: 7860 |
| short_description: FastEmbed API all-MiniLM-L6-v2 and Colbertv2.0 |
| models: |
| - sentence-transformers/all-MiniLM-L6-v2 |
| - colbert-ir/colbertv2.0 |
| tags: |
| - embeddings |
| - fastapi |
| - sentence-transformers |
| - colbert |
| - fastembed |
| suggested_hardware: cpu-upgrade |
| pinned: false |
| --- |
| |
| # Embeddings sidecar |
|
|
| Tiny FastAPI service that wraps `fastembed` and exposes: |
|
|
| - `POST /embed/dense` β dense vectors via `sentence-transformers/all-MiniLM-L6-v2` (384-dim) |
| - `POST /embed/colbert` β late-interaction multi-vectors via `colbert-ir/colbertv2.0` (per-token, 128-dim) |
| - `POST /embed/colbert/query` β query-side ColBERT embeddings |
| - `GET /health` |
|
|
| The Next.js app calls this service over HTTP. It exists because Node's |
| `fastembed-js` has spotty coverage for ColBERT/late-interaction; Python |
| `fastembed` handles both models cleanly. |
|
|
| ## Run |
|
|
| ```bash |
| cd embeddings |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install -r requirements.txt |
| uvicorn main:app --port 7860 |
| ``` |
|
|
| First request will download the model weights (cached under `~/.cache/fastembed`). |
|
|
| ## Smoke test |
|
|
| ```bash |
| curl -X POST localhost:7860/embed/dense \ |
| -H 'content-type: application/json' \ |
| -d '{"texts":["hello world"]}' | jq '.vectors[0] | length' # -> 384 |
| ``` |
|
|