QORA-TTS 0.6B - Pure Rust Text-to-Speech

Pure Rust TTS engine with 9 built-in speakers. No Python, no CUDA, no safetensors needed. Single executable + Q4 binary = portable TTS.

Based on Qwen3-TTS-12Hz-0.6B-CustomVoice (Apache 2.0).

Quick Start

# Use built-in speaker
qora-tts.exe --model-path . --speaker ryan --language english --text "Hello, how are you?"

# Different speaker
qora-tts.exe --model-path . --speaker serena --language chinese --text "你好世界"

# Japanese speaker
qora-tts.exe --model-path . --speaker ono_anna --language japanese --text "こんにちは"

# Control length and output
qora-tts.exe --model-path . --speaker aiden --language english --text "Good morning!" --max-codes 200 --output greeting.wav

Files

  qora-tts.exe          4.3 MB   Inference engine
  model.qora-tts       971 MB    Q4 weights (talker + predictor + decoder)
  config.json           4.8 KB   Model configuration
  tokenizer.json         11 MB   Tokenizer (151,936 vocab)
  vocab.json            2.7 MB   Vocabulary
  merges.txt            1.6 MB   BPE merges
  tokenizer_config.json 7.2 KB   Tokenizer config

No safetensors needed. Everything loads from model.qora-tts.

Model Info

Property	Value
Base Model	Qwen3-TTS-12Hz-0.6B-CustomVoice
Type	Built-in speakers (9 voices)
Quantization	Q4 (4-bit symmetric, group_size=32)
Binary Size	971 MB
Sample Rate	24 kHz mono WAV
Languages	English, Chinese, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian

Architecture

Component	Details
Talker	28 layers, hidden=1024, 16/8 GQA heads, SwiGLU 3072
Code Predictor	5 layers, hidden=1024, 16 code groups
Speech Decoder	8-layer transformer + Vocos vocoder

CLI Arguments

Flag	Default	Description
`--model-path <dir>`	`.`	Directory containing model.qora-tts + config
`--text <text>`	"Hello, how are you today?"	Text to synthesize
`--speaker <name>`	ryan	Built-in speaker name
`--language <name>`	english	Target language
`--output <path>`	output.wav	Output WAV path
`--max-codes <n>`	500	Max code timesteps (~n/12.5 seconds)

Built-in Speakers

Speaker	Language	Description
ryan	English	Dynamic male voice
aiden	English	Sunny American male
serena	Chinese	Warm, gentle female
vivian	Chinese	Bright young female
uncle_fu	Chinese	Seasoned male
dylan	Beijing dialect	Youthful male
eric	Sichuan dialect	Lively male
ono_anna	Japanese	Playful female
sohee	Korean	Warm female

Performance (i5-11500, 16GB RAM, CPU-only)

Phase	Time
Model Load	~0.6s (from binary)
Prefill	~2-5s
Code Generation	~1.5s/code
Audio Decode	~0.5s/frame
Memory	~970 MB

Converting from Safetensors

If you have the original safetensors, convert to binary:

qora-tts.exe --model-path <safetensors_dir> --save model.qora-tts --text "x" --max-codes 1

After conversion, safetensors files are no longer needed.

Built with QORA - Pure Rust AI Inference

Downloads last month: 57

Model tree for qoranet/QORA-TTS-LIGHT

Base model

Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice

Finetuned

(2)

this model