Spaces:

SFM2001
/

spititout

Sleeping

spititout / README_SPACE.md

MSF

with api option

eb426ec 23 days ago

1.43 kB

	# SPITITOUT Hugging Face Space

	This version runs without Gemini or any external model API. The React frontend calls a FastAPI backend inside the same Hugging Face Space.

	## Recommended models

	- Text on CPU: `Qwen/Qwen3-1.7B-GGUF`
	- Served through `llama-cpp-python` using the official `Qwen3-1.7B-Q8_0.gguf` quantized file.
	- Text on GPU: `Qwen/Qwen3-4B-Instruct-2507`
	- Use `LLM_BACKEND=transformers` for simple GPU deployment, or add vLLM as a separate server for higher throughput.
	- Speech to text: `openai/whisper-tiny`
	- Small and multilingual. Use `openai/whisper-base` if accuracy is more important than latency.
	- Text to speech: `hexgrad/Kokoro-82M` via `kokoro`
	- 82M parameters, lightweight, Apache licensed, and supports Mandarin voices such as `zf_xiaobei`.

	## Space settings

	Create the Space as a Docker Space, then push this folder.

	Suggested environment variables:

	```bash
	LLM_BACKEND=llamacpp
	GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
	GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
	LLAMA_CPP_N_CTX=4096
	ASR_MODEL=openai/whisper-tiny
	KOKORO_LANG_CODE=z
	KOKORO_VOICE=zf_xiaobei
	MAX_NEW_TOKENS=220
	```

	For CPU-only testing:

	```bash
	LLM_BACKEND=llamacpp
	GGUF_MODEL_REPO=Qwen/Qwen3-1.7B-GGUF
	GGUF_MODEL_FILE=Qwen3-1.7B-Q8_0.gguf
	ASR_MODEL=openai/whisper-tiny
	MAX_NEW_TOKENS=140
	```

	## Local run

	```bash
	npm install
	npm run build
	pip install -r requirements.txt
	python app.py
	```

	Then open `http://localhost:7860`.