File size: 845 Bytes
fde0b58
 
 
 
 
 
 
 
 
 
 
 
c6e316a
e006667
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
title: Inference Server
emoji: ๐Ÿ‘€
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: inference server for an embedding model
---

# inference_server

Embedding inference server built with FastAPI + PyTorch serving `all-MiniLM-L6-v2` via HTTP. Supports dynamic batching for concurrent requests.

## Run locally

```bash
uv sync
uvicorn app.main:app --host 0.0.0.0 --port 8000
```

## Docker

```bash
docker build -t inference-server .
docker run -p 7860:7860 inference-server
```

Image uses CPU-only PyTorch with the model baked in (no runtime download).

## Endpoints

- `GET /health` โ€” health check
- `POST /predict` โ€” generate embeddings
  ```bash
  curl -X POST http://localhost:7860/predict \
    -H "Content-Type: application/json" \
    -d '{"texts": ["hello world"]}'
  ```