SPITITOUT Hugging Face Space
This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space.
Recommended models
- Text on CPU:
Qwen/Qwen3-1.7B-GGUF- Served through
llama-cpp-pythonusing the officialQwen3-1.7B-Q8_0.ggufquantized file.
- Served through
- Text on GPU:
Qwen/Qwen3-4B-Instruct-2507- Use
LLM_BACKEND=transformersfor simple GPU deployment, or add vLLM as a separate server for higher throughput.
- Use
- Speech to text:
openai/whisper-tiny- Small and multilingual. Use
openai/whisper-baseif accuracy is more important than latency.
- Small and multilingual. Use
- Text to speech:
hexgrad/Kokoro-82Mviakokoro- 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as
zf_xiaobei.
- 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as
Space settings
Create the Space as a Docker Space, then push this folder.
Suggested environment variables:
LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
LLAMA_CPP_N_CTX=4096
ASR_MODEL=openai/whisper-tiny
KOKORO_LANG_CODE=z
KOKORO_VOICE=zf_xiaobei
MAX_NEW_TOKENS=220
For CPU-only testing:
LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
ASR_MODEL=openai/whisper-tiny
MAX_NEW_TOKENS=140
Local run
npm install
npm run build
pip install -r requirements.txt
python app.py
Then open http://localhost:7860.