# SPITITOUT Hugging Face Space This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space. ## Recommended models - Text on CPU: `Qwen/Qwen3-1.7B-GGUF` - Served through `llama-cpp-python` using the official `Qwen3-1.7B-Q8_0.gguf` quantized file. - Text on GPU: `Qwen/Qwen3-4B-Instruct-2507` - Use `LLM_BACKEND=transformers` for simple GPU deployment, or add vLLM as a separate server for higher throughput. - Speech to text: `openai/whisper-tiny` - Small and multilingual. Use `openai/whisper-base` if accuracy is more important than latency. - Text to speech: `hexgrad/Kokoro-82M` via `kokoro` - 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as `zf_xiaobei`. ## Space settings Create the Space as a Docker Space, then push this folder. Suggested environment variables: ```bash LLM_BACKEND=llamacpp GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf LLAMA_CPP_N_CTX=4096 ASR_MODEL=openai/whisper-tiny KOKORO_LANG_CODE=z KOKORO_VOICE=zf_xiaobei MAX_NEW_TOKENS=220 ``` For CPU-only testing: ```bash LLM_BACKEND=llamacpp GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf ASR_MODEL=openai/whisper-tiny MAX_NEW_TOKENS=140 ``` ## Local run ```bash npm install npm run build pip install -r requirements.txt python app.py ``` Then open `http://localhost:7860`.