PavonicDev commited on
Commit
d0f0a2a
Β·
verified Β·
1 Parent(s): f2eca9a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +85 -58
README.md CHANGED
@@ -12,10 +12,25 @@ base_model: HeartMuLa/HeartMuLa-oss-3B
12
  library_name: transformers
13
  ---
14
 
15
- # HeartMuLa 3B β€” 4-bit NF4 Quantized
16
 
17
  Pre-quantized 4-bit (NF4) checkpoint of [HeartMuLa-oss-3B](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) for **16 GB VRAM GPUs** (RTX 4060 Ti, RTX 5070 Ti, etc.).
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## The Problem
20
 
21
  The original HeartMuLa 3B model requires ~15 GB VRAM in bfloat16. Together with HeartCodec (~1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.
@@ -28,91 +43,110 @@ On top of that, the original code has several compatibility issues with modern P
28
  - Fits on **16 GB VRAM** together with HeartCodec
29
  - Works with **PyTorch 2.4+**, **transformers 4.57+/5.x**, **torchtune 0.4+**
30
 
31
- ## ComfyUI Usage
 
 
32
 
33
- This checkpoint works with the [HeartMuLa ComfyUI custom nodes](https://github.com/BenjaminBurworworworton/HeartMuLa_ComfyUI), but you need to apply the code fixes listed below to make it work with modern package versions.
34
 
35
  ### Setup
36
 
37
- 1. Download this checkpoint into your ComfyUI models folder:
38
- ```
39
- ComfyUI/models/HeartMuLa/HeartMuLa-4bit-3B/
 
 
40
  ```
41
 
42
- 2. You still need the original [HeartCodec](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) and tokenizer from the original repo
 
 
 
43
 
44
- 3. Install required packages in ComfyUI's Python:
45
- ```bash
46
- pip install bitsandbytes soundfile
 
 
 
 
47
  ```
48
 
49
- ## Required Code Fixes
50
 
51
- If you're using modern package versions (PyTorch 2.4+, transformers 5.x, torchtune 0.5+), you need these fixes in your heartlib code:
52
 
53
- ### 1. `ignore_mismatched_sizes` Error (transformers 5.x)
 
 
54
 
55
- Add `ignore_mismatched_sizes=True` to ALL `from_pretrained()` calls in `music_generation.py` and `lyrics_transcription.py`:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ```python
58
- # In music_generation.py - HeartCodec loading
59
  HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)
60
-
61
- # In music_generation.py - HeartMuLa loading
62
  HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)
63
-
64
- # In lyrics_transcription.py - Whisper loading
65
- WhisperForConditionalGeneration.from_pretrained(..., ignore_mismatched_sizes=True)
66
  ```
67
 
68
- ### 2. `RoPE cache is not built` Error (torchtune >= 0.5)
69
 
70
- In `modeling_heartmula.py`, add this to the `setup_caches()` method after the cache setup:
71
 
72
  ```python
73
  def setup_caches(self, ...):
74
- # ... existing cache setup code ...
75
-
76
- # ADD THIS: Initialize RoPE caches (required for torchtune >= 0.5)
77
  for m in self.modules():
78
- if hasattr(m, 'rope_init'):
79
  m.rope_init()
80
  m.to(device)
81
  ```
82
 
83
- ### 3. OOM at Codec Decode (16 GB GPUs)
84
 
85
- In `music_generation.py`, offload the model to CPU before running HeartCodec:
86
 
87
  ```python
88
- # After generating frames, BEFORE codec decode:
89
- frames = torch.stack(frames).permute(1, 2, 0).squeeze(0)
90
- self.model.reset_caches()
91
- self.model.cpu() # <-- ADD THIS
92
- torch.cuda.empty_cache() # <-- ADD THIS
93
  wav = self.audio_codec.detokenize(frames)
94
  ```
95
 
96
- ### 4. `torchcodec` Missing (torchaudio >= 2.10)
97
 
98
- Replace `torchaudio.save()` and `torchaudio.load()` with `soundfile`:
99
 
100
  ```python
101
- # Instead of torchaudio.save():
102
  import soundfile as sf
103
- wav_np = wav.cpu().float().numpy()
104
- if wav_np.ndim == 2:
105
- wav_np = wav_np.T
106
  sf.write(save_path, wav_np, 48000)
107
-
108
- # Instead of torchaudio.load():
109
- audio_data, sample_rate = sf.read(path, dtype='float32')
110
- waveform = torch.from_numpy(audio_data)
111
  ```
112
 
113
- ### 5. 4-bit Quantization Loading
114
-
115
- When loading this checkpoint, use `device_map="cuda:0"`:
116
 
117
  ```python
118
  from transformers import BitsAndBytesConfig
@@ -131,24 +165,17 @@ model = HeartMuLa.from_pretrained(
131
  )
132
  ```
133
 
134
- ## Requirements
135
-
136
- - `torch >= 2.4` with CUDA
137
- - `bitsandbytes >= 0.43`
138
- - `transformers >= 4.57`
139
- - `torchtune >= 0.4`
140
- - `soundfile`
141
- - HeartCodec + tokenizer weights from [original HeartMuLa repo](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B)
142
-
143
  ## Hardware Tested
144
 
145
- - NVIDIA RTX 5070 Ti (16 GB) β€” works with 4-bit quantization + CPU offload during codec decode
146
- - Output: 48kHz WAV audio
 
 
147
 
148
  ## Credits
149
 
150
  - Original model by [HeartMuLa Team](https://heartmula.github.io/) (Apache-2.0)
151
- - Quantization & compatibility fixes by ForgeAI / PavonicAI
152
 
153
  ## License
154
 
 
12
  library_name: transformers
13
  ---
14
 
15
+ # HeartMuLa 3B - 4-bit NF4 Quantized
16
 
17
  Pre-quantized 4-bit (NF4) checkpoint of [HeartMuLa-oss-3B](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) for **16 GB VRAM GPUs** (RTX 4060 Ti, RTX 5070 Ti, etc.).
18
 
19
+ ## Demo Songs
20
+
21
+ All songs generated with this checkpoint on an RTX 5070 Ti (16 GB) using our [ForgeAI ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa):
22
+
23
+ | Song | Genre | Duration | CFG |
24
+ |---|---|---|---|
25
+ | [Codigo del Alma (CFG 2)](demos/Codigo_del_Alma_cfg2.mp3) | Spanish Pop, Emotional | 3:00 | 2.0 |
26
+ | [Codigo del Alma (CFG 3)](demos/Codigo_del_Alma_cfg3.mp3) | Spanish Pop, Emotional | 3:00 | 3.0 |
27
+ | [Codigo del Alma (60s)](demos/Codigo_del_Alma_60s.mp3) | Spanish Pop | 1:00 | 2.0 |
28
+ | [Codigo del Alma (Latin)](demos/Codigo_del_Alma_Latin.mp3) | Latin Pop | 1:00 | 2.0 |
29
+ | [Runtime](demos/Runtime.mp3) | Chill, R&B | 3:00 | 2.0 |
30
+ | [Forged in Code](demos/Forged_in_Code.mp3) | Country Pop | 2:00 | 2.0 |
31
+ | [Digital Rain](demos/Digital_Rain.mp3) | Electronic | 1:00 | 2.0 |
32
+ | [Pixel Life](demos/Pixel_Life.mp3) | Pop | 1:00 | 2.0 |
33
+
34
  ## The Problem
35
 
36
  The original HeartMuLa 3B model requires ~15 GB VRAM in bfloat16. Together with HeartCodec (~1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.
 
43
  - Fits on **16 GB VRAM** together with HeartCodec
44
  - Works with **PyTorch 2.4+**, **transformers 4.57+/5.x**, **torchtune 0.4+**
45
 
46
+ ## ComfyUI Usage (Recommended)
47
+
48
+ Use our **[ForgeAI HeartMuLa ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa)** for the easiest setup. All compatibility fixes are applied automatically.
49
 
50
+ Also available on the [ComfyUI Registry](https://registry.comfy.org/publishers/forgeai/nodes/forgeai-heartmula).
51
 
52
  ### Setup
53
 
54
+ 1. Install via ComfyUI Manager or clone into custom_nodes:
55
+ ```bash
56
+ cd ComfyUI/custom_nodes
57
+ git clone https://github.com/PavonicAI/ForgeAI-HeartMuLa.git
58
+ pip install -r ForgeAI-HeartMuLa/requirements.txt
59
  ```
60
 
61
+ 2. Download this checkpoint into your ComfyUI models folder:
62
+ ```
63
+ ComfyUI/models/HeartMuLa/HeartMuLa-oss-3B/
64
+ ```
65
 
66
+ 3. You still need the original [HeartCodec](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) and tokenizer from the original repo:
67
+ ```
68
+ ComfyUI/models/HeartMuLa/
69
+ β”œβ”€β”€ HeartMuLa-oss-3B/ ← this checkpoint
70
+ β”œβ”€β”€ HeartCodec-oss/ ← from original repo
71
+ β”œβ”€β”€ tokenizer.json ← from original repo
72
+ └── gen_config.json ← from original repo
73
  ```
74
 
75
+ ## Tag Guide
76
 
77
+ HeartMuLa uses comma-separated tags to control style. **Genre is the most important tag** β€” always put it first.
78
 
79
+ ```
80
+ genre:pop, emotional, synth, warm, female voice
81
+ ```
82
 
83
+ ### CFG Scale
84
+
85
+ | CFG | Best For | Notes |
86
+ |---|---|---|
87
+ | **2.0** | Pop, Ballads, Emotional | Sweet spot for clean vocals |
88
+ | **3.0** | Rock, Latin, Uptempo | More energy |
89
+ | **4.0+** | Electronic, Dance | May introduce artifacts |
90
+
91
+ ### Structure Tags (in Lyrics)
92
+
93
+ ```
94
+ [intro]
95
+ [verse]
96
+ Your lyrics here...
97
+ [chorus]
98
+ Chorus lyrics...
99
+ [outro]
100
+ ```
101
+
102
+ ## Manual Setup (Without ComfyUI)
103
+
104
+ If you want to use this checkpoint without ComfyUI, you need to apply several code fixes manually. See the sections below.
105
+
106
+ ### Required Code Fixes
107
+
108
+ #### 1. ignore_mismatched_sizes Error (transformers 5.x)
109
+
110
+ Add `ignore_mismatched_sizes=True` to ALL `from_pretrained()` calls:
111
 
112
  ```python
 
113
  HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)
 
 
114
  HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)
 
 
 
115
  ```
116
 
117
+ #### 2. RoPE cache is not built Error (torchtune >= 0.5)
118
 
119
+ In `modeling_heartmula.py`, add RoPE init to `setup_caches()`:
120
 
121
  ```python
122
  def setup_caches(self, ...):
123
+ # ... existing cache setup ...
 
 
124
  for m in self.modules():
125
+ if hasattr(m, "rope_init"):
126
  m.rope_init()
127
  m.to(device)
128
  ```
129
 
130
+ #### 3. OOM at Codec Decode (16 GB GPUs)
131
 
132
+ Offload model to CPU before codec decode:
133
 
134
  ```python
135
+ self.model.cpu()
136
+ torch.cuda.empty_cache()
 
 
 
137
  wav = self.audio_codec.detokenize(frames)
138
  ```
139
 
140
+ #### 4. torchcodec Missing (torchaudio >= 2.10)
141
 
142
+ Replace torchaudio with soundfile:
143
 
144
  ```python
 
145
  import soundfile as sf
 
 
 
146
  sf.write(save_path, wav_np, 48000)
 
 
 
 
147
  ```
148
 
149
+ #### 5. 4-bit Quantization Loading
 
 
150
 
151
  ```python
152
  from transformers import BitsAndBytesConfig
 
165
  )
166
  ```
167
 
 
 
 
 
 
 
 
 
 
168
  ## Hardware Tested
169
 
170
+ - NVIDIA RTX 5070 Ti (16 GB) with 4-bit quantization
171
+ - ~13 GB VRAM during generation, ~8 GB during encoding
172
+ - Stable for hours of continuous generation
173
+ - Output: 48kHz stereo audio
174
 
175
  ## Credits
176
 
177
  - Original model by [HeartMuLa Team](https://heartmula.github.io/) (Apache-2.0)
178
+ - Quantization, compatibility fixes & ComfyUI node by [ForgeAI / PavonicAI](https://github.com/PavonicAI)
179
 
180
  ## License
181