File size: 1,432 Bytes
eb426ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# SPITITOUT Hugging Face Space

This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space.

## Recommended models

- Text on CPU: `Qwen/Qwen3-1.7B-GGUF`
  - Served through `llama-cpp-python` using the official `Qwen3-1.7B-Q8_0.gguf` quantized file.
- Text on GPU: `Qwen/Qwen3-4B-Instruct-2507`
  - Use `LLM_BACKEND=transformers` for simple GPU deployment, or add vLLM as a separate server for higher throughput.
- Speech to text: `openai/whisper-tiny`
  - Small and multilingual. Use `openai/whisper-base` if accuracy is more important than latency.
- Text to speech: `hexgrad/Kokoro-82M` via `kokoro`
  - 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as `zf_xiaobei`.

## Space settings

Create the Space as a Docker Space, then push this folder.

Suggested environment variables:

```bash
LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
LLAMA_CPP_N_CTX=4096
ASR_MODEL=openai/whisper-tiny
KOKORO_LANG_CODE=z
KOKORO_VOICE=zf_xiaobei
MAX_NEW_TOKENS=220
```

For CPU-only testing:

```bash
LLM_BACKEND=llamacpp
GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
ASR_MODEL=openai/whisper-tiny
MAX_NEW_TOKENS=140
```

## Local run

```bash
npm install
npm run build
pip install -r requirements.txt
python app.py
```

Then open `http://localhost:7860`.