File size: 1,342 Bytes
72cef04 cd13631 72cef04 983d8eb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ---
title: Embeddings sidecar
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: FastEmbed API all-MiniLM-L6-v2 and Colbertv2.0
models:
- sentence-transformers/all-MiniLM-L6-v2
- colbert-ir/colbertv2.0
tags:
- embeddings
- fastapi
- sentence-transformers
- colbert
- fastembed
suggested_hardware: cpu-upgrade
pinned: false
---
# Embeddings sidecar
Tiny FastAPI service that wraps `fastembed` and exposes:
- `POST /embed/dense` β dense vectors via `sentence-transformers/all-MiniLM-L6-v2` (384-dim)
- `POST /embed/colbert` β late-interaction multi-vectors via `colbert-ir/colbertv2.0` (per-token, 128-dim)
- `POST /embed/colbert/query` β query-side ColBERT embeddings
- `GET /health`
The Next.js app calls this service over HTTP. It exists because Node's
`fastembed-js` has spotty coverage for ColBERT/late-interaction; Python
`fastembed` handles both models cleanly.
## Run
```bash
cd embeddings
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --port 7860
```
First request will download the model weights (cached under `~/.cache/fastembed`).
## Smoke test
```bash
curl -X POST localhost:7860/embed/dense \
-H 'content-type: application/json' \
-d '{"texts":["hello world"]}' | jq '.vectors[0] | length' # -> 384
```
|