Instructions to use mlx-community/SongGeneration-v2-medium-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/SongGeneration-v2-medium-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir SongGeneration-v2-medium-4bit mlx-community/SongGeneration-v2-medium-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
File size: 5,057 Bytes
5ab2334 4311556 5ab2334 e5991a5 5ab2334 e5991a5 5ab2334 e5991a5 5ab2334 4311556 5ab2334 4311556 5ab2334 e5991a5 5ab2334 4311556 5ab2334 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | ---
license: other
library_name: mlx
pipeline_tag: text-to-audio
base_model:
- tencent/SongGeneration
tags:
- mlx
- apple-silicon
- music-generation
- song-generation
- audiolm
---
Part of the SongGeneration MLX conversion set. Collection: https://huggingface.co/collections/mlx-community/songgeneration-v2-mlx-6a1bf9342dd0806419737229
# SongGeneration-v2-medium-4bit
Apple MLX weights for the autoregressive `audiolm` token generator from Tencent SongGeneration v2-medium.
This is not a full-stack pure MLX audio pipeline yet: token generation runs with MLX, while FLAC decoding currently uses the official PyTorch Flow1dVAE / separate-tokenizer bridge in [`ailuntx/SongGeneration-MLX`](https://github.com/ailuntx/SongGeneration-MLX).
## TL;DR
| | |
|---|---|
| **Variant** | `v2-medium` |
| **Precision** | `4bit` |
| **Converted component** | SongGeneration `audiolm` token generator |
| **Runtime** | [`ailuntx/SongGeneration-MLX`](https://github.com/ailuntx/SongGeneration-MLX) |
| **Official model** | [`tencent/SongGeneration`](https://huggingface.co/tencent/SongGeneration) |
| **Official code** | [`tencent-ailab/songgeneration`](https://github.com/tencent-ailab/songgeneration) |
## Quick Start
```bash
git clone https://github.com/ailuntx/SongGeneration-MLX.git
cd SongGeneration-MLX
python -m venv .venv
.venv/bin/pip install -e .
.venv/bin/pip install -U huggingface_hub hf_transfer
HF_HUB_ENABLE_HF_TRANSFER=1 .venv/bin/hf download mlx-community/SongGeneration-v2-medium-4bit --local-dir ./models/SongGeneration-v2-medium-4bit
.venv/bin/python -m songgeneration_mlx.cli \
--model ./models/SongGeneration-v2-medium-4bit \
--lyrics '[verse] hello from mlx [chorus] sing it again' \
--description 'Pop, female vocal, bright production, [Musicality-medium].' \
--duration 2 \
--top-k 50 \
--temperature 0.9 \
--output tokens_2s.npz
```
## Decode Tokens
The MLX runtime writes discrete song tokens. To create FLAC audio, use the official decoder bridge in [`ailuntx/SongGeneration-MLX`](https://github.com/ailuntx/SongGeneration-MLX). The bridge needs the official SongGeneration `runtime/` assets, but it does not need the original SongGeneration LM `model.pt` when `--mlx-model` is used.
```bash
python -m venv .venv-decoder
.venv-decoder/bin/pip install -U pip
.venv-decoder/bin/pip install \
-r third_party/SongGeneration/requirements.txt \
-r third_party/SongGeneration/requirements_nodeps.txt \
soundfile
HF_HUB_ENABLE_HF_TRANSFER=1 .venv/bin/hf download tencent/SongGeneration \
--include "runtime/*" \
--local-dir ./third_party/SongGeneration
PYTORCH_ENABLE_MPS_FALLBACK=1 SONGGEN_DEVICE=mps \
.venv-decoder/bin/python scripts/decode_tokens_official.py \
--mlx-model ./models/SongGeneration-v2-medium-4bit \
--tokens ./tokens_2s.npz \
--output ./output_2s.flac \
--device mps
```
## Variants
| Variant | Disk | Notes |
|---|---:|---|
| `SongGeneration-v2-medium-fp32` | 10G | high-precision medium baseline |
| `SongGeneration-v2-medium-bf16` | 5.2G | recommended medium bf16 quality baseline |
| `SongGeneration-v2-medium-8bit` | 2.8G | smaller medium checkpoint |
| `SongGeneration-v2-medium-4bit` | 1.5G | smallest medium checkpoint |
| `SongGeneration-v2-large-fp32` | 19G | high-precision large baseline |
| `SongGeneration-v2-large-bf16` | 9.5G | large bf16 quality baseline |
| `SongGeneration-v2-large-8bit` | 5.0G | smaller large checkpoint |
| `SongGeneration-v2-large-4bit` | 2.7G | smallest large checkpoint |
| `SongGeneration-v2-fast-*` | pending | upstream fast weights were not publicly available when checked on 2026-05-31 |
## Layout
```text
SongGeneration-v2-medium-4bit/
|-- model-00001-of-000xx.safetensors
|-- model.safetensors.index.json
|-- config.json
|-- mlx_manifest.json
|-- config.official.yaml
|-- vocab.yaml
`-- qwen2_tokenizer/
```
## Validation
Local Apple Silicon validation was run on the medium bf16 path:
| Test | Result |
|---|---|
| 12s MLX token generation | 550 pattern steps, about 1 minute wall time |
| 12s official decoder bridge | 73.27s wall time |
| 12s FLAC | 48kHz stereo, 12.000s, RMS about `0.163` |
The official recent-token repetition penalty is implemented in the MLX runtime. Without it, 12s generations collapse into repeated tokens and decode close to silence.
## License
License follows the upstream SongGeneration release. Check the official model card and repository for the authoritative model license.
## Citation
```bibtex
@misc{songgeneration-mlx,
title = {SongGeneration-MLX: Apple MLX port of SongGeneration},
author = {ailuntx},
year = {2026},
url = {https://github.com/ailuntx/SongGeneration-MLX},
}
@article{lei2025levo,
title = {LeVo: High-Quality Song Generation with Multi-Preference Alignment},
author = {Lei, Shun and Xu, Yaoxun and Lin, Zhiwei and Zhang, Huaicheng and Tan, Wei and Chen, Hangting and Yu, Jianwei and Zhang, Yixuan and Yang, Chenyu and Zhu, Haina and Wang, Shuai and Wu, Zhiyong and Yu, Dong},
journal = {arXiv preprint arXiv:2506.07520},
year = {2025},
}
```
|