embedding / README.md
Ryan Ballantyne
HF Spaces YAML
cd13631
metadata
title: Embeddings sidecar
emoji: πŸ”
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
short_description: FastEmbed API all-MiniLM-L6-v2 and Colbertv2.0
models:
  - sentence-transformers/all-MiniLM-L6-v2
  - colbert-ir/colbertv2.0
tags:
  - embeddings
  - fastapi
  - sentence-transformers
  - colbert
  - fastembed
suggested_hardware: cpu-upgrade
pinned: false

Embeddings sidecar

Tiny FastAPI service that wraps fastembed and exposes:

  • POST /embed/dense β€” dense vectors via sentence-transformers/all-MiniLM-L6-v2 (384-dim)
  • POST /embed/colbert β€” late-interaction multi-vectors via colbert-ir/colbertv2.0 (per-token, 128-dim)
  • POST /embed/colbert/query β€” query-side ColBERT embeddings
  • GET /health

The Next.js app calls this service over HTTP. It exists because Node's fastembed-js has spotty coverage for ColBERT/late-interaction; Python fastembed handles both models cleanly.

Run

cd embeddings
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --port 7860

First request will download the model weights (cached under ~/.cache/fastembed).

Smoke test

curl -X POST localhost:7860/embed/dense \
  -H 'content-type: application/json' \
  -d '{"texts":["hello world"]}' | jq '.vectors[0] | length'   # -> 384