| # SPITITOUT Hugging Face Space |
|
|
| This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space. |
|
|
| ## Recommended models |
|
|
| - Text on CPU: `Qwen/Qwen3-1.7B-GGUF` |
| - Served through `llama-cpp-python` using the official `Qwen3-1.7B-Q8_0.gguf` quantized file. |
| - Text on GPU: `Qwen/Qwen3-4B-Instruct-2507` |
| - Use `LLM_BACKEND=transformers` for simple GPU deployment, or add vLLM as a separate server for higher throughput. |
| - Speech to text: `openai/whisper-tiny` |
| - Small and multilingual. Use `openai/whisper-base` if accuracy is more important than latency. |
| - Text to speech: `hexgrad/Kokoro-82M` via `kokoro` |
| - 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as `zf_xiaobei`. |
|
|
| ## Space settings |
|
|
| Create the Space as a Docker Space, then push this folder. |
|
|
| Suggested environment variables: |
|
|
| ```bash |
| LLM_BACKEND=llamacpp |
| GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF |
| GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf |
| LLAMA_CPP_N_CTX=4096 |
| ASR_MODEL=openai/whisper-tiny |
| KOKORO_LANG_CODE=z |
| KOKORO_VOICE=zf_xiaobei |
| MAX_NEW_TOKENS=220 |
| ``` |
|
|
| For CPU-only testing: |
|
|
| ```bash |
| LLM_BACKEND=llamacpp |
| GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF |
| GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf |
| ASR_MODEL=openai/whisper-tiny |
| MAX_NEW_TOKENS=140 |
| ``` |
|
|
| ## Local run |
|
|
| ```bash |
| npm install |
| npm run build |
| pip install -r requirements.txt |
| python app.py |
| ``` |
|
|
| Then open `http://localhost:7860`. |
|
|