QORA-TTS 0.6B - Pure Rust Text-to-Speech
Pure Rust TTS engine with 9 built-in speakers. No Python, no CUDA, no safetensors needed. Single executable + Q4 binary = portable TTS.
Based on Qwen3-TTS-12Hz-0.6B-CustomVoice (Apache 2.0).
Quick Start
# Use built-in speaker
qora-tts.exe --model-path . --speaker ryan --language english --text "Hello, how are you?"
# Different speaker
qora-tts.exe --model-path . --speaker serena --language chinese --text "你好世界"
# Japanese speaker
qora-tts.exe --model-path . --speaker ono_anna --language japanese --text "こんにちは"
# Control length and output
qora-tts.exe --model-path . --speaker aiden --language english --text "Good morning!" --max-codes 200 --output greeting.wav
Files
qora-tts.exe 4.3 MB Inference engine
model.qora-tts 971 MB Q4 weights (talker + predictor + decoder)
config.json 4.8 KB Model configuration
tokenizer.json 11 MB Tokenizer (151,936 vocab)
vocab.json 2.7 MB Vocabulary
merges.txt 1.6 MB BPE merges
tokenizer_config.json 7.2 KB Tokenizer config
No safetensors needed. Everything loads from model.qora-tts.
Model Info
| Property | Value |
|---|---|
| Base Model | Qwen3-TTS-12Hz-0.6B-CustomVoice |
| Type | Built-in speakers (9 voices) |
| Quantization | Q4 (4-bit symmetric, group_size=32) |
| Binary Size | 971 MB |
| Sample Rate | 24 kHz mono WAV |
| Languages | English, Chinese, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian |
Architecture
| Component | Details |
|---|---|
| Talker | 28 layers, hidden=1024, 16/8 GQA heads, SwiGLU 3072 |
| Code Predictor | 5 layers, hidden=1024, 16 code groups |
| Speech Decoder | 8-layer transformer + Vocos vocoder |
CLI Arguments
| Flag | Default | Description |
|---|---|---|
--model-path <dir> |
. |
Directory containing model.qora-tts + config |
--text <text> |
"Hello, how are you today?" | Text to synthesize |
--speaker <name> |
ryan | Built-in speaker name |
--language <name> |
english | Target language |
--output <path> |
output.wav | Output WAV path |
--max-codes <n> |
500 | Max code timesteps (~n/12.5 seconds) |
Built-in Speakers
| Speaker | Language | Description |
|---|---|---|
| ryan | English | Dynamic male voice |
| aiden | English | Sunny American male |
| serena | Chinese | Warm, gentle female |
| vivian | Chinese | Bright young female |
| uncle_fu | Chinese | Seasoned male |
| dylan | Beijing dialect | Youthful male |
| eric | Sichuan dialect | Lively male |
| ono_anna | Japanese | Playful female |
| sohee | Korean | Warm female |
Performance (i5-11500, 16GB RAM, CPU-only)
| Phase | Time |
|---|---|
| Model Load | ~0.6s (from binary) |
| Prefill | ~2-5s |
| Code Generation | ~1.5s/code |
| Audio Decode | ~0.5s/frame |
| Memory | ~970 MB |
Converting from Safetensors
If you have the original safetensors, convert to binary:
qora-tts.exe --model-path <safetensors_dir> --save model.qora-tts --text "x" --max-codes 1
After conversion, safetensors files are no longer needed.
Built with QORA - Pure Rust AI Inference
- Downloads last month
- 57
Model tree for qoranet/QORA-TTS-LIGHT
Base model
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice