CosyVoice3 0.5B β€” MLX 8-bit-full (LLM + flow)

Re-quantization of FunAudioLLM/Fun-CosyVoice3-0.5B to MLX 8-bit weights for the LLM and flow-matching components, packaged for soniqo/speech-swift's CosyVoiceTTSModel.

The HiFi-GAN vocoder ships as fp32, byte-identical to the upstream checkpoint, because vocoder quantization audibly degrades waveform quality and the size cost is small (~83 MB).

Why not 4-bit?

Subjective listening (5 ZH + 5 EN prompts Γ— native + cloned voice) shows the 4-bit MLX bundle on the same model produces audible "AI artefact" noise on Chinese, especially short greetings, that the 8-bit bundle largely removes. See the originating repo's design doc for the listening matrix.

How to use

# end-user (downstream consumer)
python -c "from huggingface_hub import snapshot_download; \
    snapshot_download(repo_id='aimason/CosyVoice3-0.5B-MLX-8bit-full', revision='v1', \
                      local_dir='~/Library/Caches/qwen3-speech/models/aimason/CosyVoice3-0.5B-MLX-8bit-full')"

# then run audio-server (built from the speech-swift fork with our patches)
audio-server --cosyvoice-model-id aimason/CosyVoice3-0.5B-MLX-8bit-full

License

Apache-2.0, inherited from the upstream FunAudioLLM CosyVoice3 weights. The LICENSE file in this repo is a verbatim copy of the upstream license. Conversion adds nothing copyrightable beyond a deterministic re-quantization recipe.

Reproducibility

The conversion recipe lives in weights_signature.json (PyTorch checkpoint sha + mlx version + bits/group_size + script git sha). Re-running scripts/cosyvoice3-mlx-quantize.py with matching inputs reproduces these weights bit-for-bit.

PUBLISH_MANIFEST.json pins per-file sha256 β€” downstream consumers verify integrity against this manifest after snapshot_download.

Attribution

Downloads last month
71
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support