File size: 5,585 Bytes
07ac279
 
 
 
 
 
 
 
 
 
 
 
 
 
d0f0a2a
07ac279
 
 
d0f0a2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db88514
07ac279
db88514
07ac279
db88514
 
 
 
 
 
 
 
d0f0a2a
 
 
db88514
d0f0a2a
db88514
 
 
d0f0a2a
 
 
 
 
db88514
 
d0f0a2a
 
 
 
db88514
d0f0a2a
 
 
 
 
 
 
db88514
 
d0f0a2a
db88514
d0f0a2a
db88514
d0f0a2a
 
 
db88514
d0f0a2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db88514
 
 
 
 
 
d0f0a2a
db88514
d0f0a2a
07ac279
 
db88514
d0f0a2a
db88514
d0f0a2a
db88514
 
 
 
d0f0a2a
db88514
d0f0a2a
db88514
 
d0f0a2a
 
db88514
 
 
d0f0a2a
db88514
d0f0a2a
db88514
 
 
 
 
 
d0f0a2a
db88514
 
 
 
 
 
 
 
 
07ac279
 
db88514
 
07ac279
 
 
 
 
 
 
d0f0a2a
 
 
 
07ac279
 
 
 
d0f0a2a
07ac279
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---

license: apache-2.0
tags:
  - music-generation
  - heartmula
  - 4bit
  - quantized
  - bitsandbytes
  - nf4
  - comfyui
base_model: HeartMuLa/HeartMuLa-oss-3B
library_name: transformers
---


# HeartMuLa 3B - 4-bit NF4 Quantized

Pre-quantized 4-bit (NF4) checkpoint of [HeartMuLa-oss-3B](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) for **16 GB VRAM GPUs** (RTX 4060 Ti, RTX 5070 Ti, etc.).

## Demo Songs

All songs generated with this checkpoint on an RTX 5070 Ti (16 GB) using our [ForgeAI ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa):

| Song | Genre | Duration | CFG |
|---|---|---|---|
| [Codigo del Alma (CFG 2)](demos/Codigo_del_Alma_cfg2.mp3) | Spanish Pop, Emotional | 3:00 | 2.0 |
| [Codigo del Alma (CFG 3)](demos/Codigo_del_Alma_cfg3.mp3) | Spanish Pop, Emotional | 3:00 | 3.0 |
| [Codigo del Alma (60s)](demos/Codigo_del_Alma_60s.mp3) | Spanish Pop | 1:00 | 2.0 |
| [Codigo del Alma (Latin)](demos/Codigo_del_Alma_Latin.mp3) | Latin Pop | 1:00 | 2.0 |
| [Runtime](demos/Runtime.mp3) | Chill, R&B | 3:00 | 2.0 |
| [Forged in Code](demos/Forged_in_Code.mp3) | Country Pop | 2:00 | 2.0 |
| [Digital Rain](demos/Digital_Rain.mp3) | Electronic | 1:00 | 2.0 |
| [Pixel Life](demos/Pixel_Life.mp3) | Pop | 1:00 | 2.0 |

## The Problem

The original HeartMuLa 3B model requires ~15 GB VRAM in bfloat16. Together with HeartCodec (~1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.

On top of that, the original code has several compatibility issues with modern PyTorch/transformers/torchtune versions (see fixes below).

## What This Checkpoint Does

- **4-bit NF4 quantized** HeartMuLa 3B (~4.9 GB instead of ~6 GB)
- Fits on **16 GB VRAM** together with HeartCodec
- Works with **PyTorch 2.4+**, **transformers 4.57+/5.x**, **torchtune 0.4+**

## ComfyUI Usage (Recommended)

Use our **[ForgeAI HeartMuLa ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa)** for the easiest setup. All compatibility fixes are applied automatically.

Also available on the [ComfyUI Registry](https://registry.comfy.org/publishers/forgeai/nodes/forgeai-heartmula).

### Setup

1. Install via ComfyUI Manager or clone into custom_nodes:

   ```bash

   cd ComfyUI/custom_nodes
   git clone https://github.com/PavonicAI/ForgeAI-HeartMuLa.git
   pip install -r ForgeAI-HeartMuLa/requirements.txt
   ```



2. Download this checkpoint into your ComfyUI models folder:

   ```
   ComfyUI/models/HeartMuLa/HeartMuLa-oss-3B/
   ```



3. You still need the original [HeartCodec](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) and tokenizer from the original repo:

   ```
   ComfyUI/models/HeartMuLa/
     β”œβ”€β”€ HeartMuLa-oss-3B/    ← this checkpoint

     β”œβ”€β”€ HeartCodec-oss/       ← from original repo

     β”œβ”€β”€ tokenizer.json        ← from original repo

     └── gen_config.json       ← from original repo

   ```


## Tag Guide

HeartMuLa uses comma-separated tags to control style. **Genre is the most important tag** β€” always put it first.

```

genre:pop, emotional, synth, warm, female voice

```

### CFG Scale

| CFG | Best For | Notes |
|---|---|---|
| **2.0** | Pop, Ballads, Emotional | Sweet spot for clean vocals |
| **3.0** | Rock, Latin, Uptempo | More energy |
| **4.0+** | Electronic, Dance | May introduce artifacts |

### Structure Tags (in Lyrics)

```

[intro]

[verse]

Your lyrics here...

[chorus]

Chorus lyrics...

[outro]

```

## Manual Setup (Without ComfyUI)

If you want to use this checkpoint without ComfyUI, you need to apply several code fixes manually. See the sections below.

### Required Code Fixes

#### 1. ignore_mismatched_sizes Error (transformers 5.x)

Add `ignore_mismatched_sizes=True` to ALL `from_pretrained()` calls:

```python

HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)

HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)

```

#### 2. RoPE cache is not built Error (torchtune >= 0.5)

In `modeling_heartmula.py`, add RoPE init to `setup_caches()`:

```python

def setup_caches(self, ...):

    # ... existing cache setup ...

    for m in self.modules():

        if hasattr(m, "rope_init"):

            m.rope_init()

            m.to(device)

```

#### 3. OOM at Codec Decode (16 GB GPUs)

Offload model to CPU before codec decode:

```python

self.model.cpu()

torch.cuda.empty_cache()

wav = self.audio_codec.detokenize(frames)

```

#### 4. torchcodec Missing (torchaudio >= 2.10)

Replace torchaudio with soundfile:

```python

import soundfile as sf

sf.write(save_path, wav_np, 48000)

```

#### 5. 4-bit Quantization Loading

```python

from transformers import BitsAndBytesConfig



bnb_config = BitsAndBytesConfig(

    load_in_4bit=True,

    bnb_4bit_compute_dtype=torch.bfloat16,

    bnb_4bit_quant_type="nf4",

)



model = HeartMuLa.from_pretrained(

    "PavonicAI/HeartMuLa-3B-4bit",

    quantization_config=bnb_config,

    device_map="cuda:0",

    ignore_mismatched_sizes=True,

)

```

## Hardware Tested

- NVIDIA RTX 5070 Ti (16 GB) with 4-bit quantization
- ~13 GB VRAM during generation, ~8 GB during encoding
- Stable for hours of continuous generation
- Output: 48kHz stereo audio

## Credits

- Original model by [HeartMuLa Team](https://heartmula.github.io/) (Apache-2.0)
- Quantization, compatibility fixes & ComfyUI node by [ForgeAI / PavonicAI](https://github.com/PavonicAI)

## License

Apache-2.0 (same as original)