Qwentestapi / README.md
CJHauser's picture
Update README.md
e416226 verified
|
Raw
History Blame Contribute Delete
844 Bytes
---
title: LLM Chat API
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---
# LLM Chat API
OpenAI-compatible chat API running Qwen 2.5 3B on CPU with optimized llama-cpp build.
## SillyTavern Connection
API Connections → Chat Completion → Custom (OpenAI-compatible):
- **Server URL**: `https://YOUR-SPACE-NAME.hf.space`
- **Model**: `qwen-3b`
- **API Key**: anything (not validated)
## Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/` | Status page |
| GET | `/health` | Health check |
| GET | `/v1/models` | List models |
| POST | `/v1/chat/completions` | Chat (streaming supported) |
## Notes
- First boot downloads the model (~2.5GB) into persistent storage `/data/models/`
- Subsequent boots load from cache instantly
- Built with OpenBLAS + AVX2 for best CPU throughput