Qwentestapi / README.md
CJHauser's picture
Update README.md
e416226 verified
|
Raw
History Blame Contribute Delete
844 Bytes
metadata
title: LLM Chat API
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

LLM Chat API

OpenAI-compatible chat API running Qwen 2.5 3B on CPU with optimized llama-cpp build.

SillyTavern Connection

API Connections → Chat Completion → Custom (OpenAI-compatible):

  • Server URL: https://YOUR-SPACE-NAME.hf.space
  • Model: qwen-3b
  • API Key: anything (not validated)

Endpoints

Method Path Description
GET / Status page
GET /health Health check
GET /v1/models List models
POST /v1/chat/completions Chat (streaming supported)

Notes

  • First boot downloads the model (~2.5GB) into persistent storage /data/models/
  • Subsequent boots load from cache instantly
  • Built with OpenBLAS + AVX2 for best CPU throughput