CosyVoice3 0.5B — MLX 8-bit-full (LLM + flow)

Re-quantization of FunAudioLLM/Fun-CosyVoice3-0.5B to MLX 8-bit weights for the LLM and flow-matching components, packaged for soniqo/speech-swift's CosyVoiceTTSModel.

The HiFi-GAN vocoder ships as fp32, byte-identical to the upstream checkpoint, because vocoder quantization audibly degrades waveform quality and the size cost is small (~83 MB).

Why not 4-bit?

Subjective listening (5 ZH + 5 EN prompts × native + cloned voice) shows the 4-bit MLX bundle on the same model produces audible "AI artefact" noise on Chinese, especially short greetings, that the 8-bit bundle largely removes. See the originating repo's design doc for the listening matrix.

How to use

# end-user (downstream consumer)
python -c "from huggingface_hub import snapshot_download; \
    snapshot_download(repo_id='aimason/CosyVoice3-0.5B-MLX-8bit-full', revision='v1', \
                      local_dir='~/Library/Caches/qwen3-speech/models/aimason/CosyVoice3-0.5B-MLX-8bit-full')"

# then run audio-server (built from the speech-swift fork with our patches)
audio-server --cosyvoice-model-id aimason/CosyVoice3-0.5B-MLX-8bit-full

License

Apache-2.0, inherited from the upstream FunAudioLLM CosyVoice3 weights. The LICENSE file in this repo is a verbatim copy of the upstream license. Conversion adds nothing copyrightable beyond a deterministic re-quantization recipe.

Reproducibility

The conversion recipe lives in weights_signature.json (PyTorch checkpoint sha + mlx version + bits/group_size + script git sha). Re-running scripts/cosyvoice3-mlx-quantize.py with matching inputs reproduces these weights bit-for-bit.

PUBLISH_MANIFEST.json pins per-file sha256 — downstream consumers verify integrity against this manifest after snapshot_download.

Attribution

Upstream model & training: Alibaba FunAudioLLM team — https://github.com/FunAudioLLM/CosyVoice
speech-swift MLX runtime: soniqo — https://github.com/soniqo/speech-swift
This re-quantization: aimason — only changes are bit-width and packing to match the MLX runtime; no fine-tuning or distillation.

Downloads last month: 10

MLX

Hardware compatibility

Quantized