spititout / README_SPACE.md
MSF
with api option
eb426ec

SPITITOUT Hugging Face Space

This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space.

Recommended models

  • Text on CPU: Qwen/Qwen3-1.7B-GGUF
    • Served through llama-cpp-python using the official Qwen3-1.7B-Q8_0.gguf quantized file.
  • Text on GPU: Qwen/Qwen3-4B-Instruct-2507
    • Use LLM_BACKEND=transformers for simple GPU deployment, or add vLLM as a separate server for higher throughput.
  • Speech to text: openai/whisper-tiny
    • Small and multilingual. Use openai/whisper-base if accuracy is more important than latency.
  • Text to speech: hexgrad/Kokoro-82M via kokoro
    • 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as zf_xiaobei.

Space settings

Create the Space as a Docker Space, then push this folder.

Suggested environment variables:

LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
LLAMA_CPP_N_CTX=4096
ASR_MODEL=openai/whisper-tiny
KOKORO_LANG_CODE=z
KOKORO_VOICE=zf_xiaobei
MAX_NEW_TOKENS=220

For CPU-only testing:

LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
ASR_MODEL=openai/whisper-tiny
MAX_NEW_TOKENS=140

Local run

npm install
npm run build
pip install -r requirements.txt
python app.py

Then open http://localhost:7860.