Part of the SongGeneration MLX conversion set. Collection: https://huggingface.co/collections/mlx-community/songgeneration-v2-mlx-6a1bf9342dd0806419737229

SongGeneration-v2-large-4bit

Apple MLX weights for the autoregressive audiolm token generator from Tencent SongGeneration v2-large.

This is not a full-stack pure MLX audio pipeline yet: token generation runs with MLX, while FLAC decoding currently uses the official PyTorch Flow1dVAE / separate-tokenizer bridge in ailuntx/SongGeneration-MLX.

TL;DR

Variant v2-large
Precision 4bit
Converted component SongGeneration audiolm token generator
Runtime ailuntx/SongGeneration-MLX
Official model tencent/SongGeneration
Official code tencent-ailab/songgeneration

Quick Start

git clone https://github.com/ailuntx/SongGeneration-MLX.git
cd SongGeneration-MLX
python -m venv .venv
.venv/bin/pip install -e .
.venv/bin/pip install -U huggingface_hub hf_transfer

HF_HUB_ENABLE_HF_TRANSFER=1 .venv/bin/hf download mlx-community/SongGeneration-v2-large-4bit --local-dir ./models/SongGeneration-v2-large-4bit

.venv/bin/python -m songgeneration_mlx.cli \
  --model ./models/SongGeneration-v2-large-4bit \
  --lyrics '[verse] hello from mlx [chorus] sing it again' \
  --description 'Pop, female vocal, bright production, [Musicality-medium].' \
  --duration 2 \
  --top-k 50 \
  --temperature 0.9 \
  --output tokens_2s.npz

Decode Tokens

The MLX runtime writes discrete song tokens. To create FLAC audio, use the official decoder bridge in ailuntx/SongGeneration-MLX. The bridge needs the official SongGeneration runtime/ assets, but it does not need the original SongGeneration LM model.pt when --mlx-model is used.

python -m venv .venv-decoder
.venv-decoder/bin/pip install -U pip
.venv-decoder/bin/pip install \
  -r third_party/SongGeneration/requirements.txt \
  -r third_party/SongGeneration/requirements_nodeps.txt \
  soundfile

HF_HUB_ENABLE_HF_TRANSFER=1 .venv/bin/hf download tencent/SongGeneration \
  --include "runtime/*" \
  --local-dir ./third_party/SongGeneration

PYTORCH_ENABLE_MPS_FALLBACK=1 SONGGEN_DEVICE=mps \
.venv-decoder/bin/python scripts/decode_tokens_official.py \
  --mlx-model ./models/SongGeneration-v2-large-4bit \
  --tokens ./tokens_2s.npz \
  --output ./output_2s.flac \
  --device mps

Variants

Variant Disk Notes
SongGeneration-v2-medium-fp32 10G high-precision medium baseline
SongGeneration-v2-medium-bf16 5.2G recommended medium bf16 quality baseline
SongGeneration-v2-medium-8bit 2.8G smaller medium checkpoint
SongGeneration-v2-medium-4bit 1.5G smallest medium checkpoint
SongGeneration-v2-large-fp32 19G high-precision large baseline
SongGeneration-v2-large-bf16 9.5G large bf16 quality baseline
SongGeneration-v2-large-8bit 5.0G smaller large checkpoint
SongGeneration-v2-large-4bit 2.7G smallest large checkpoint
SongGeneration-v2-fast-* pending upstream fast weights were not publicly available when checked on 2026-05-31

Layout

SongGeneration-v2-large-4bit/
|-- model-00001-of-000xx.safetensors
|-- model.safetensors.index.json
|-- config.json
|-- mlx_manifest.json
|-- config.official.yaml
|-- vocab.yaml
`-- qwen2_tokenizer/

Validation

Local Apple Silicon validation was run on the medium bf16 path:

Test Result
12s MLX token generation 550 pattern steps, about 1 minute wall time
12s official decoder bridge 73.27s wall time
12s FLAC 48kHz stereo, 12.000s, RMS about 0.163

The official recent-token repetition penalty is implemented in the MLX runtime. Without it, 12s generations collapse into repeated tokens and decode close to silence.

License

License follows the upstream SongGeneration release. Check the official model card and repository for the authoritative model license.

Citation

@misc{songgeneration-mlx,
  title  = {SongGeneration-MLX: Apple MLX port of SongGeneration},
  author = {ailuntx},
  year   = {2026},
  url    = {https://github.com/ailuntx/SongGeneration-MLX},
}

@article{lei2025levo,
  title   = {LeVo: High-Quality Song Generation with Multi-Preference Alignment},
  author  = {Lei, Shun and Xu, Yaoxun and Lin, Zhiwei and Zhang, Huaicheng and Tan, Wei and Chen, Hangting and Yu, Jianwei and Zhang, Yixuan and Yang, Chenyu and Zhu, Haina and Wang, Shuai and Wu, Zhiyong and Yu, Dong},
  journal = {arXiv preprint arXiv:2506.07520},
  year    = {2025},
}
Downloads last month
23
Safetensors
Model size
0.8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/SongGeneration-v2-large-4bit

Finetuned
(10)
this model

Collection including mlx-community/SongGeneration-v2-large-4bit

Paper for mlx-community/SongGeneration-v2-large-4bit