metadata
title: Embeddings sidecar
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: FastEmbed API all-MiniLM-L6-v2 and Colbertv2.0
models:
- sentence-transformers/all-MiniLM-L6-v2
- colbert-ir/colbertv2.0
tags:
- embeddings
- fastapi
- sentence-transformers
- colbert
- fastembed
suggested_hardware: cpu-upgrade
pinned: false
Embeddings sidecar
Tiny FastAPI service that wraps fastembed and exposes:
POST /embed/denseβ dense vectors viasentence-transformers/all-MiniLM-L6-v2(384-dim)POST /embed/colbertβ late-interaction multi-vectors viacolbert-ir/colbertv2.0(per-token, 128-dim)POST /embed/colbert/queryβ query-side ColBERT embeddingsGET /health
The Next.js app calls this service over HTTP. It exists because Node's
fastembed-js has spotty coverage for ColBERT/late-interaction; Python
fastembed handles both models cleanly.
Run
cd embeddings
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --port 7860
First request will download the model weights (cached under ~/.cache/fastembed).
Smoke test
curl -X POST localhost:7860/embed/dense \
-H 'content-type: application/json' \
-d '{"texts":["hello world"]}' | jq '.vectors[0] | length' # -> 384