Spaces:
Paused
Paused
| title: LLM Chat API | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| # LLM Chat API | |
| OpenAI-compatible chat API running Qwen 2.5 3B on CPU with optimized llama-cpp build. | |
| ## SillyTavern Connection | |
| API Connections → Chat Completion → Custom (OpenAI-compatible): | |
| - **Server URL**: `https://YOUR-SPACE-NAME.hf.space` | |
| - **Model**: `qwen-3b` | |
| - **API Key**: anything (not validated) | |
| ## Endpoints | |
| | Method | Path | Description | | |
| |--------|------|-------------| | |
| | GET | `/` | Status page | | |
| | GET | `/health` | Health check | | |
| | GET | `/v1/models` | List models | | |
| | POST | `/v1/chat/completions` | Chat (streaming supported) | | |
| ## Notes | |
| - First boot downloads the model (~2.5GB) into persistent storage `/data/models/` | |
| - Subsequent boots load from cache instantly | |
| - Built with OpenBLAS + AVX2 for best CPU throughput |