File size: 5,585 Bytes
07ac279 d0f0a2a 07ac279 d0f0a2a db88514 07ac279 db88514 07ac279 db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a 07ac279 db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 d0f0a2a db88514 07ac279 db88514 07ac279 d0f0a2a 07ac279 d0f0a2a 07ac279 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | ---
license: apache-2.0
tags:
- music-generation
- heartmula
- 4bit
- quantized
- bitsandbytes
- nf4
- comfyui
base_model: HeartMuLa/HeartMuLa-oss-3B
library_name: transformers
---
# HeartMuLa 3B - 4-bit NF4 Quantized
Pre-quantized 4-bit (NF4) checkpoint of [HeartMuLa-oss-3B](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) for **16 GB VRAM GPUs** (RTX 4060 Ti, RTX 5070 Ti, etc.).
## Demo Songs
All songs generated with this checkpoint on an RTX 5070 Ti (16 GB) using our [ForgeAI ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa):
| Song | Genre | Duration | CFG |
|---|---|---|---|
| [Codigo del Alma (CFG 2)](demos/Codigo_del_Alma_cfg2.mp3) | Spanish Pop, Emotional | 3:00 | 2.0 |
| [Codigo del Alma (CFG 3)](demos/Codigo_del_Alma_cfg3.mp3) | Spanish Pop, Emotional | 3:00 | 3.0 |
| [Codigo del Alma (60s)](demos/Codigo_del_Alma_60s.mp3) | Spanish Pop | 1:00 | 2.0 |
| [Codigo del Alma (Latin)](demos/Codigo_del_Alma_Latin.mp3) | Latin Pop | 1:00 | 2.0 |
| [Runtime](demos/Runtime.mp3) | Chill, R&B | 3:00 | 2.0 |
| [Forged in Code](demos/Forged_in_Code.mp3) | Country Pop | 2:00 | 2.0 |
| [Digital Rain](demos/Digital_Rain.mp3) | Electronic | 1:00 | 2.0 |
| [Pixel Life](demos/Pixel_Life.mp3) | Pop | 1:00 | 2.0 |
## The Problem
The original HeartMuLa 3B model requires ~15 GB VRAM in bfloat16. Together with HeartCodec (~1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.
On top of that, the original code has several compatibility issues with modern PyTorch/transformers/torchtune versions (see fixes below).
## What This Checkpoint Does
- **4-bit NF4 quantized** HeartMuLa 3B (~4.9 GB instead of ~6 GB)
- Fits on **16 GB VRAM** together with HeartCodec
- Works with **PyTorch 2.4+**, **transformers 4.57+/5.x**, **torchtune 0.4+**
## ComfyUI Usage (Recommended)
Use our **[ForgeAI HeartMuLa ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa)** for the easiest setup. All compatibility fixes are applied automatically.
Also available on the [ComfyUI Registry](https://registry.comfy.org/publishers/forgeai/nodes/forgeai-heartmula).
### Setup
1. Install via ComfyUI Manager or clone into custom_nodes:
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/PavonicAI/ForgeAI-HeartMuLa.git
pip install -r ForgeAI-HeartMuLa/requirements.txt
```
2. Download this checkpoint into your ComfyUI models folder:
```
ComfyUI/models/HeartMuLa/HeartMuLa-oss-3B/
```
3. You still need the original [HeartCodec](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) and tokenizer from the original repo:
```
ComfyUI/models/HeartMuLa/
βββ HeartMuLa-oss-3B/ β this checkpoint
βββ HeartCodec-oss/ β from original repo
βββ tokenizer.json β from original repo
βββ gen_config.json β from original repo
```
## Tag Guide
HeartMuLa uses comma-separated tags to control style. **Genre is the most important tag** β always put it first.
```
genre:pop, emotional, synth, warm, female voice
```
### CFG Scale
| CFG | Best For | Notes |
|---|---|---|
| **2.0** | Pop, Ballads, Emotional | Sweet spot for clean vocals |
| **3.0** | Rock, Latin, Uptempo | More energy |
| **4.0+** | Electronic, Dance | May introduce artifacts |
### Structure Tags (in Lyrics)
```
[intro]
[verse]
Your lyrics here...
[chorus]
Chorus lyrics...
[outro]
```
## Manual Setup (Without ComfyUI)
If you want to use this checkpoint without ComfyUI, you need to apply several code fixes manually. See the sections below.
### Required Code Fixes
#### 1. ignore_mismatched_sizes Error (transformers 5.x)
Add `ignore_mismatched_sizes=True` to ALL `from_pretrained()` calls:
```python
HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)
HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)
```
#### 2. RoPE cache is not built Error (torchtune >= 0.5)
In `modeling_heartmula.py`, add RoPE init to `setup_caches()`:
```python
def setup_caches(self, ...):
# ... existing cache setup ...
for m in self.modules():
if hasattr(m, "rope_init"):
m.rope_init()
m.to(device)
```
#### 3. OOM at Codec Decode (16 GB GPUs)
Offload model to CPU before codec decode:
```python
self.model.cpu()
torch.cuda.empty_cache()
wav = self.audio_codec.detokenize(frames)
```
#### 4. torchcodec Missing (torchaudio >= 2.10)
Replace torchaudio with soundfile:
```python
import soundfile as sf
sf.write(save_path, wav_np, 48000)
```
#### 5. 4-bit Quantization Loading
```python
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
)
model = HeartMuLa.from_pretrained(
"PavonicAI/HeartMuLa-3B-4bit",
quantization_config=bnb_config,
device_map="cuda:0",
ignore_mismatched_sizes=True,
)
```
## Hardware Tested
- NVIDIA RTX 5070 Ti (16 GB) with 4-bit quantization
- ~13 GB VRAM during generation, ~8 GB during encoding
- Stable for hours of continuous generation
- Output: 48kHz stereo audio
## Credits
- Original model by [HeartMuLa Team](https://heartmula.github.io/) (Apache-2.0)
- Quantization, compatibility fixes & ComfyUI node by [ForgeAI / PavonicAI](https://github.com/PavonicAI)
## License
Apache-2.0 (same as original)
|