kyr0/qwen3-TTS-12Hz-0.6B-Base-4bit-partial-quantization

This model was converted to MLX format from Qwen/Qwen3-TTS-12Hz-0.6B-Base using mlx-audio version pinned to my fork and PR to support partial quantization!

PR 398.

Refer to the original model card for more details on the model.

Please star this repo if you find this useful - it will soon be updated with fast inference / streaming server code!

https://github.com/kyr0/qwen3-tts-mlx

Requirements

Create a requirements.txt file with the following content:

git+https://github.com/kyr0/mlx-audio.git@fix_qwen3_tts_quantization
transformers>=4.45.0
click
numpy
soundfile
huggingface_hub
tqdm
requests

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

CLI Example

python -m mlx_audio.tts.generate \
  --model kyr0/qwen3-TTS-12Hz-0.6B-Base-4bit-partial-quantization \
  --text "Hallo. Das ist ein Test." \
  --ref_audio reference.wav \
  --ref_text "Das ist ein Referenztext." \
  --lang_code german \
  --speed 3 \
  --gender "male"

NOTE: You will find reference.wav in the same directory as this README. This is a reference audio file that has been synthetically generated using Qwen3-TTS.

Performance

For the CLI example on a Macbook Air M4 base model, 24GB unified memory:

wall clock (time): 3.794s
3.35s user 1.00s system 114% cpu 3.794 total

NOTE: This includes loading the model, preprocessing, and generating the audio. The actual inference time is much shorter if the model is loaded into memory and warmed up (when MPS compute shader graphs are cached).

IMPORTANT: "Time-to-first-Sample" or latency can be further reduced by activating streaming mode. This model support streaming out-of-the-box -- I'm on it :-)

Python Example

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio

model = load_model("kyr0/qwen3-TTS-12Hz-0.6B-Base-4bit-partial-quantization")
generate_audio(
    model=model,
    text="Hello, this is a test.",
    voice="serena",
    instruct="Happy and excited.",
    file_prefix="test_audio",
)

Available Speakers

serena, vivian, uncle_fu, ryan, aiden, ono_anna, sohee, eric, dylan

Downloads last month
41
Safetensors
Model size
0.4B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support