--- title: LLM Chat API emoji: 🤖 colorFrom: blue colorTo: green sdk: docker pinned: false --- # LLM Chat API OpenAI-compatible chat API running Qwen 2.5 3B on CPU with optimized llama-cpp build. ## SillyTavern Connection API Connections → Chat Completion → Custom (OpenAI-compatible): - **Server URL**: `https://YOUR-SPACE-NAME.hf.space` - **Model**: `qwen-3b` - **API Key**: anything (not validated) ## Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/` | Status page | | GET | `/health` | Health check | | GET | `/v1/models` | List models | | POST | `/v1/chat/completions` | Chat (streaming supported) | ## Notes - First boot downloads the model (~2.5GB) into persistent storage `/data/models/` - Subsequent boots load from cache instantly - Built with OpenBLAS + AVX2 for best CPU throughput