Spaces:

SFM2001
/

spititout

Sleeping

App Files Files Community

spititout / README_SPACE.md

MSF

with api option

eb426ec 23 days ago

preview code

raw

history blame contribute delete

1.43 kB

SPITITOUT Hugging Face Space

This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space.

Recommended models

Text on CPU: Qwen/Qwen3-1.7B-GGUF
- Served through llama-cpp-python using the official Qwen3-1.7B-Q8_0.gguf quantized file.
Text on GPU: Qwen/Qwen3-4B-Instruct-2507
- Use LLM_BACKEND=transformers for simple GPU deployment, or add vLLM as a separate server for higher throughput.
Speech to text: openai/whisper-tiny
- Small and multilingual. Use openai/whisper-base if accuracy is more important than latency.
Text to speech: hexgrad/Kokoro-82M via kokoro
- 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as zf_xiaobei.

Space settings

Create the Space as a Docker Space, then push this folder.

Suggested environment variables:

LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
LLAMA_CPP_N_CTX=4096
ASR_MODEL=openai/whisper-tiny
KOKORO_LANG_CODE=z
KOKORO_VOICE=zf_xiaobei
MAX_NEW_TOKENS=220

For CPU-only testing:

LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
ASR_MODEL=openai/whisper-tiny
MAX_NEW_TOKENS=140

Local run

npm install
npm run build
pip install -r requirements.txt
python app.py

Then open http://localhost:7860.