Spaces:

CJHauser
/

Qwentestapi

Paused

Qwentestapi / README.md

Update README.md

e416226 verified 7 days ago

844 Bytes

	---
	title: LLM Chat API
	emoji: 🤖
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	---

	# LLM Chat API

	OpenAI-compatible chat API running Qwen 2.5 3B on CPU with optimized llama-cpp build.

	## SillyTavern Connection

	API Connections → Chat Completion → Custom (OpenAI-compatible):
	- Server URL: `https://YOUR-SPACE-NAME.hf.space`
	- Model: `qwen-3b`
	- API Key: anything (not validated)

	## Endpoints

	\| Method \| Path \| Description \|
	\|--------\|------\|-------------\|
	\| GET \| `/` \| Status page \|
	\| GET \| `/health` \| Health check \|
	\| GET \| `/v1/models` \| List models \|
	\| POST \| `/v1/chat/completions` \| Chat (streaming supported) \|

	## Notes

	- First boot downloads the model (~2.5GB) into persistent storage `/data/models/`
	- Subsequent boots load from cache instantly
	- Built with OpenBLAS + AVX2 for best CPU throughput