Instructions to use aimason/CosyVoice3-0.5B-MLX-8bit-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use aimason/CosyVoice3-0.5B-MLX-8bit-full with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir CosyVoice3-0.5B-MLX-8bit-full aimason/CosyVoice3-0.5B-MLX-8bit-full
- CosyVoice
How to use aimason/CosyVoice3-0.5B-MLX-8bit-full with CosyVoice:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
CosyVoice3 0.5B β MLX 8-bit-full (LLM + flow)
Re-quantization of FunAudioLLM/Fun-CosyVoice3-0.5B
to MLX 8-bit weights for the LLM and flow-matching components, packaged
for soniqo/speech-swift's
CosyVoiceTTSModel.
The HiFi-GAN vocoder ships as fp32, byte-identical to the upstream checkpoint, because vocoder quantization audibly degrades waveform quality and the size cost is small (~83 MB).
Why not 4-bit?
Subjective listening (5 ZH + 5 EN prompts Γ native + cloned voice) shows the 4-bit MLX bundle on the same model produces audible "AI artefact" noise on Chinese, especially short greetings, that the 8-bit bundle largely removes. See the originating repo's design doc for the listening matrix.
How to use
# end-user (downstream consumer)
python -c "from huggingface_hub import snapshot_download; \
snapshot_download(repo_id='aimason/CosyVoice3-0.5B-MLX-8bit-full', revision='v1', \
local_dir='~/Library/Caches/qwen3-speech/models/aimason/CosyVoice3-0.5B-MLX-8bit-full')"
# then run audio-server (built from the speech-swift fork with our patches)
audio-server --cosyvoice-model-id aimason/CosyVoice3-0.5B-MLX-8bit-full
License
Apache-2.0, inherited from the upstream FunAudioLLM CosyVoice3 weights.
The LICENSE file in this repo is a verbatim copy of the upstream
license. Conversion adds nothing copyrightable beyond a deterministic
re-quantization recipe.
Reproducibility
The conversion recipe lives in weights_signature.json (PyTorch
checkpoint sha + mlx version + bits/group_size + script git sha).
Re-running scripts/cosyvoice3-mlx-quantize.py
with matching inputs reproduces these weights bit-for-bit.
PUBLISH_MANIFEST.json pins per-file sha256 β downstream consumers
verify integrity against this manifest after snapshot_download.
Attribution
- Upstream model & training: Alibaba FunAudioLLM team β https://github.com/FunAudioLLM/CosyVoice
- speech-swift MLX runtime: soniqo β https://github.com/soniqo/speech-swift
- This re-quantization: aimason β only changes are bit-width and packing to match the MLX runtime; no fine-tuning or distillation.
- Downloads last month
- 71
Quantized