kyr0/qwen3-TTS-12Hz-0.6B-CustomVoice-4bit-partial-quantization

This model was converted to MLX format from Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice using mlx-audio version pinned to my fork and PR to support partial quantization!

PR 398.

Refer to the original model card for more details on the model.

Please star this repo if you find this useful - it will soon be updated with fast inference / streaming server code!

https://github.com/kyr0/qwen3-tts-mlx

Requirements

Create a requirements.txt file with the following content:

git+https://github.com/kyr0/mlx-audio.git@fix_qwen3_tts_quantization
transformers>=4.45.0
click
numpy
soundfile
huggingface_hub
tqdm
requests

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

CLI Example

python -m mlx_audio.tts.generate \
  --model kyr0/qwen3-TTS-12Hz-0.6B-CustomVoice-4bit-partial-quantization \
  --text "Das ist ein Referenztext." \
  --lang_code german \
  --speed 4 \
  --pitch 0.1 \
  --instruct "Excited, friendly radio moderator." \
  --gender "female" \
  --voice "aiden"

Performance

For the CLI example on a Macbook Air M4 base model, 24GB unified memory:

wall clock (time): 3.794s
3.35s user 1.00s system 114% cpu 3.794 total

NOTE: This includes loading the model, preprocessing, and generating the audio. The actual inference time is much shorter if the model is loaded into memory and warmed up (when MPS compute shader graphs are cached).

IMPORTANT: "Time-to-first-Sample" or latency can be further reduced by activating streaming mode. This model support streaming out-of-the-box -- I'm on it :-)

Python Example

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio

model = load_model("kyr0/qwen3-TTS-12Hz-0.6B-CustomVoice-4bit-partial-quantization")
generate_audio(
    model=model,
    text="Hello, this is a test.",
    voice="serena",
    instruct="Happy and excited.",
    file_prefix="test_audio",
)

Available Speakers

serena, vivian, uncle_fu, ryan, aiden, ono_anna, sohee, eric, dylan

Downloads last month
114
Safetensors
Model size
0.4B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support