Spaces:

BeardedAmbivert
/

inference-server

Sleeping

App Files Files Community

inference-server / README.md

Aditya Kulkarni

add HF Spaces metadata

fde0b58 3 months ago

preview code

raw

history blame contribute delete

845 Bytes

metadata

title: Inference Server
emoji: 👀
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: inference server for an embedding model

inference_server

Embedding inference server built with FastAPI + PyTorch serving all-MiniLM-L6-v2 via HTTP. Supports dynamic batching for concurrent requests.

Run locally

uv sync
uvicorn app.main:app --host 0.0.0.0 --port 8000

Docker

docker build -t inference-server .
docker run -p 7860:7860 inference-server

Image uses CPU-only PyTorch with the model baked in (no runtime download).

Endpoints

GET /health — health check

POST /predict — generate embeddings

curl -X POST http://localhost:7860/predict \
  -H "Content-Type: application/json" \
  -d '{"texts": ["hello world"]}'