QORA-TTS 0.6B - Pure Rust Text-to-Speech

Pure Rust TTS engine with 9 built-in speakers. No Python, no CUDA, no safetensors needed. Single executable + Q4 binary = portable TTS.

Based on Qwen3-TTS-12Hz-0.6B-CustomVoice (Apache 2.0).

Quick Start

# Use built-in speaker
qora-tts.exe --model-path . --speaker ryan --language english --text "Hello, how are you?"

# Different speaker
qora-tts.exe --model-path . --speaker serena --language chinese --text "你好世界"

# Japanese speaker
qora-tts.exe --model-path . --speaker ono_anna --language japanese --text "こんにちは"

# Control length and output
qora-tts.exe --model-path . --speaker aiden --language english --text "Good morning!" --max-codes 200 --output greeting.wav

Files

  qora-tts.exe          4.3 MB   Inference engine
  model.qora-tts       971 MB    Q4 weights (talker + predictor + decoder)
  config.json           4.8 KB   Model configuration
  tokenizer.json         11 MB   Tokenizer (151,936 vocab)
  vocab.json            2.7 MB   Vocabulary
  merges.txt            1.6 MB   BPE merges
  tokenizer_config.json 7.2 KB   Tokenizer config

No safetensors needed. Everything loads from model.qora-tts.

Model Info

Property Value
Base Model Qwen3-TTS-12Hz-0.6B-CustomVoice
Type Built-in speakers (9 voices)
Quantization Q4 (4-bit symmetric, group_size=32)
Binary Size 971 MB
Sample Rate 24 kHz mono WAV
Languages English, Chinese, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian

Architecture

Component Details
Talker 28 layers, hidden=1024, 16/8 GQA heads, SwiGLU 3072
Code Predictor 5 layers, hidden=1024, 16 code groups
Speech Decoder 8-layer transformer + Vocos vocoder

CLI Arguments

Flag Default Description
--model-path <dir> . Directory containing model.qora-tts + config
--text <text> "Hello, how are you today?" Text to synthesize
--speaker <name> ryan Built-in speaker name
--language <name> english Target language
--output <path> output.wav Output WAV path
--max-codes <n> 500 Max code timesteps (~n/12.5 seconds)

Built-in Speakers

Speaker Language Description
ryan English Dynamic male voice
aiden English Sunny American male
serena Chinese Warm, gentle female
vivian Chinese Bright young female
uncle_fu Chinese Seasoned male
dylan Beijing dialect Youthful male
eric Sichuan dialect Lively male
ono_anna Japanese Playful female
sohee Korean Warm female

Performance (i5-11500, 16GB RAM, CPU-only)

Phase Time
Model Load ~0.6s (from binary)
Prefill ~2-5s
Code Generation ~1.5s/code
Audio Decode ~0.5s/frame
Memory ~970 MB

Converting from Safetensors

If you have the original safetensors, convert to binary:

qora-tts.exe --model-path <safetensors_dir> --save model.qora-tts --text "x" --max-codes 1

After conversion, safetensors files are no longer needed.


Built with QORA - Pure Rust AI Inference

Downloads last month
57
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for qoranet/QORA-TTS-LIGHT

Finetuned
(2)
this model