Instructions to use lightsofapollo/omnivoice-mlx-q4-g64 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use lightsofapollo/omnivoice-mlx-q4-g64 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir omnivoice-mlx-q4-g64 lightsofapollo/omnivoice-mlx-q4-g64
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| license: apache-2.0 | |
| library_name: mlx | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - mlx | |
| - tts | |
| - text-to-speech | |
| - omnivoice | |
| - quantized | |
| base_model: k2-fsa/OmniVoice | |
| # OmniVoice — int4 g=64 (MLX) | |
| 4-bit, group-size-64 affine quantization of [k2-fsa/OmniVoice](https://huggingface.co/k2-fsa/OmniVoice), | |
| produced with `mlx-audio` for Apple Silicon. | |
| ## Sizes | |
| | | Backbone | Total | | |
| |---|---|---| | |
| | original (bf16, this repo's `audio_tokenizer/` is unchanged) | ~1.2 GB | ~1.6 GB | | |
| | **this repo (int4 g=64 backbone, bf16 tokenizer)** | **329 MB** | **724 MB** | | |
| Quantization applies only to the Qwen3 backbone Linear layers (and the tied | |
| audio embedding/head matmuls). The Higgs Audio V2 acoustic tokenizer | |
| (decoder, RVQ, semantic) is left at bfloat16 to preserve audio fidelity. | |
| ## Performance (M-series, mlx-audio 0.x) | |
| | Prompt | RTF (bf16) | RTF (this) | | |
| |---|---|---| | |
| | "Voice synthesis on Apple Silicon has come a long way. We can now generate full sentences in real time." | 3.68× | **4.59×** (+25%) | | |
| Whisper-small round-trip: identical transcript to bf16 on the long prompt. | |
| ## Usage (mlx-audio Python) | |
| ```python | |
| import json | |
| import mlx.core as mx | |
| import mlx.nn as nn | |
| from huggingface_hub import snapshot_download | |
| from mlx_audio.tts.models.omnivoice.config import OmniVoiceConfig | |
| from mlx_audio.tts.models.omnivoice.omnivoice import Model | |
| path = snapshot_download("lightsofapollo/omnivoice-mlx-q4-g64") | |
| cfg_dict = json.load(open(f"{path}/config.json")) | |
| model = Model(OmniVoiceConfig(**{k: v for k, v in cfg_dict.items() if k in OmniVoiceConfig.__dataclass_fields__})) | |
| # IMPORTANT: quantize the model shape *before* loading weights. | |
| q = cfg_dict["quantization"] | |
| nn.quantize(model, group_size=q["group_size"], bits=q["bits"], mode=q.get("mode", "affine"), | |
| class_predicate=lambda _p, m: hasattr(m, "to_quantized")) | |
| raw = dict(mx.load(f"{path}/model.safetensors")) | |
| model.load_weights(list(model.sanitize(raw).items())) | |
| mx.eval(model.parameters()) | |
| ``` | |
| ## How this was made | |
| ```bash | |
| python -m mlx_audio.tts.models.omnivoice.convert \ | |
| --model k2-fsa/OmniVoice --output omnivoice-bf16 --dtype bfloat16 | |
| python -m mlx_audio.convert \ | |
| --hf-path omnivoice-bf16 --mlx-path omnivoice-q4-g64 \ | |
| --quantize --q-bits 4 --q-group-size 64 | |
| ``` | |
| ## License | |
| Apache-2.0 (inherited from `k2-fsa/OmniVoice`). | |