File size: 845 Bytes
fde0b58 c6e316a e006667 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ---
title: Inference Server
emoji: ๐
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: inference server for an embedding model
---
# inference_server
Embedding inference server built with FastAPI + PyTorch serving `all-MiniLM-L6-v2` via HTTP. Supports dynamic batching for concurrent requests.
## Run locally
```bash
uv sync
uvicorn app.main:app --host 0.0.0.0 --port 8000
```
## Docker
```bash
docker build -t inference-server .
docker run -p 7860:7860 inference-server
```
Image uses CPU-only PyTorch with the model baked in (no runtime download).
## Endpoints
- `GET /health` โ health check
- `POST /predict` โ generate embeddings
```bash
curl -X POST http://localhost:7860/predict \
-H "Content-Type: application/json" \
-d '{"texts": ["hello world"]}'
``` |