--- license: apache-2.0 library_name: mlx pipeline_tag: text-to-speech tags: - mlx - tts - text-to-speech - omnivoice - quantized base_model: k2-fsa/OmniVoice --- # OmniVoice — int4 g=64 (MLX) 4-bit, group-size-64 affine quantization of [k2-fsa/OmniVoice](https://huggingface.co/k2-fsa/OmniVoice), produced with `mlx-audio` for Apple Silicon. ## Sizes | | Backbone | Total | |---|---|---| | original (bf16, this repo's `audio_tokenizer/` is unchanged) | ~1.2 GB | ~1.6 GB | | **this repo (int4 g=64 backbone, bf16 tokenizer)** | **329 MB** | **724 MB** | Quantization applies only to the Qwen3 backbone Linear layers (and the tied audio embedding/head matmuls). The Higgs Audio V2 acoustic tokenizer (decoder, RVQ, semantic) is left at bfloat16 to preserve audio fidelity. ## Performance (M-series, mlx-audio 0.x) | Prompt | RTF (bf16) | RTF (this) | |---|---|---| | "Voice synthesis on Apple Silicon has come a long way. We can now generate full sentences in real time." | 3.68× | **4.59×** (+25%) | Whisper-small round-trip: identical transcript to bf16 on the long prompt. ## Usage (mlx-audio Python) ```python import json import mlx.core as mx import mlx.nn as nn from huggingface_hub import snapshot_download from mlx_audio.tts.models.omnivoice.config import OmniVoiceConfig from mlx_audio.tts.models.omnivoice.omnivoice import Model path = snapshot_download("lightsofapollo/omnivoice-mlx-q4-g64") cfg_dict = json.load(open(f"{path}/config.json")) model = Model(OmniVoiceConfig(**{k: v for k, v in cfg_dict.items() if k in OmniVoiceConfig.__dataclass_fields__})) # IMPORTANT: quantize the model shape *before* loading weights. q = cfg_dict["quantization"] nn.quantize(model, group_size=q["group_size"], bits=q["bits"], mode=q.get("mode", "affine"), class_predicate=lambda _p, m: hasattr(m, "to_quantized")) raw = dict(mx.load(f"{path}/model.safetensors")) model.load_weights(list(model.sanitize(raw).items())) mx.eval(model.parameters()) ``` ## How this was made ```bash python -m mlx_audio.tts.models.omnivoice.convert \ --model k2-fsa/OmniVoice --output omnivoice-bf16 --dtype bfloat16 python -m mlx_audio.convert \ --hf-path omnivoice-bf16 --mlx-path omnivoice-q4-g64 \ --quantize --q-bits 4 --q-group-size 64 ``` ## License Apache-2.0 (inherited from `k2-fsa/OmniVoice`).