| | ---
|
| | license: apache-2.0
|
| | tags:
|
| | - music-generation
|
| | - heartmula
|
| | - 4bit
|
| | - quantized
|
| | - bitsandbytes
|
| | - nf4
|
| | - comfyui
|
| | base_model: HeartMuLa/HeartMuLa-oss-3B
|
| | library_name: transformers
|
| | ---
|
| |
|
| | # HeartMuLa 3B - 4-bit NF4 Quantized
|
| |
|
| | Pre-quantized 4-bit (NF4) checkpoint of [HeartMuLa-oss-3B](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) for **16 GB VRAM GPUs** (RTX 4060 Ti, RTX 5070 Ti, etc.).
|
| |
|
| | ## Demo Songs
|
| |
|
| | All songs generated with this checkpoint on an RTX 5070 Ti (16 GB) using our [ForgeAI ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa):
|
| |
|
| | | Song | Genre | Duration | CFG |
|
| | |---|---|---|---|
|
| | | [Codigo del Alma (CFG 2)](demos/Codigo_del_Alma_cfg2.mp3) | Spanish Pop, Emotional | 3:00 | 2.0 |
|
| | | [Codigo del Alma (CFG 3)](demos/Codigo_del_Alma_cfg3.mp3) | Spanish Pop, Emotional | 3:00 | 3.0 |
|
| | | [Codigo del Alma (60s)](demos/Codigo_del_Alma_60s.mp3) | Spanish Pop | 1:00 | 2.0 |
|
| | | [Codigo del Alma (Latin)](demos/Codigo_del_Alma_Latin.mp3) | Latin Pop | 1:00 | 2.0 |
|
| | | [Runtime](demos/Runtime.mp3) | Chill, R&B | 3:00 | 2.0 |
|
| | | [Forged in Code](demos/Forged_in_Code.mp3) | Country Pop | 2:00 | 2.0 |
|
| | | [Digital Rain](demos/Digital_Rain.mp3) | Electronic | 1:00 | 2.0 |
|
| | | [Pixel Life](demos/Pixel_Life.mp3) | Pop | 1:00 | 2.0 |
|
| |
|
| | ## The Problem
|
| |
|
| | The original HeartMuLa 3B model requires ~15 GB VRAM in bfloat16. Together with HeartCodec (~1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.
|
| |
|
| | On top of that, the original code has several compatibility issues with modern PyTorch/transformers/torchtune versions (see fixes below).
|
| |
|
| | ## What This Checkpoint Does
|
| |
|
| | - **4-bit NF4 quantized** HeartMuLa 3B (~4.9 GB instead of ~6 GB)
|
| | - Fits on **16 GB VRAM** together with HeartCodec
|
| | - Works with **PyTorch 2.4+**, **transformers 4.57+/5.x**, **torchtune 0.4+**
|
| |
|
| | ## ComfyUI Usage (Recommended)
|
| |
|
| | Use our **[ForgeAI HeartMuLa ComfyUI Node](https://github.com/PavonicAI/ForgeAI-HeartMuLa)** for the easiest setup. All compatibility fixes are applied automatically.
|
| |
|
| | Also available on the [ComfyUI Registry](https://registry.comfy.org/publishers/forgeai/nodes/forgeai-heartmula).
|
| |
|
| | ### Setup
|
| |
|
| | 1. Install via ComfyUI Manager or clone into custom_nodes:
|
| | ```bash
|
| | cd ComfyUI/custom_nodes
|
| | git clone https://github.com/PavonicAI/ForgeAI-HeartMuLa.git
|
| | pip install -r ForgeAI-HeartMuLa/requirements.txt
|
| | ```
|
| |
|
| | 2. Download this checkpoint into your ComfyUI models folder:
|
| | ```
|
| | ComfyUI/models/HeartMuLa/HeartMuLa-oss-3B/
|
| | ```
|
| |
|
| | 3. You still need the original [HeartCodec](https://huggingface.co/HeartMuLa/HeartMuLa-oss-3B) and tokenizer from the original repo:
|
| | ```
|
| | ComfyUI/models/HeartMuLa/
|
| | βββ HeartMuLa-oss-3B/ β this checkpoint
|
| | βββ HeartCodec-oss/ β from original repo
|
| | βββ tokenizer.json β from original repo
|
| | βββ gen_config.json β from original repo
|
| | ```
|
| |
|
| | ## Tag Guide
|
| |
|
| | HeartMuLa uses comma-separated tags to control style. **Genre is the most important tag** β always put it first.
|
| |
|
| | ```
|
| | genre:pop, emotional, synth, warm, female voice
|
| | ```
|
| |
|
| | ### CFG Scale
|
| |
|
| | | CFG | Best For | Notes |
|
| | |---|---|---|
|
| | | **2.0** | Pop, Ballads, Emotional | Sweet spot for clean vocals |
|
| | | **3.0** | Rock, Latin, Uptempo | More energy |
|
| | | **4.0+** | Electronic, Dance | May introduce artifacts |
|
| |
|
| | ### Structure Tags (in Lyrics)
|
| |
|
| | ```
|
| | [intro]
|
| | [verse]
|
| | Your lyrics here...
|
| | [chorus]
|
| | Chorus lyrics...
|
| | [outro]
|
| | ```
|
| |
|
| | ## Manual Setup (Without ComfyUI)
|
| |
|
| | If you want to use this checkpoint without ComfyUI, you need to apply several code fixes manually. See the sections below.
|
| |
|
| | ### Required Code Fixes
|
| |
|
| | #### 1. ignore_mismatched_sizes Error (transformers 5.x)
|
| |
|
| | Add `ignore_mismatched_sizes=True` to ALL `from_pretrained()` calls:
|
| |
|
| | ```python
|
| | HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)
|
| | HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)
|
| | ```
|
| |
|
| | #### 2. RoPE cache is not built Error (torchtune >= 0.5)
|
| |
|
| | In `modeling_heartmula.py`, add RoPE init to `setup_caches()`:
|
| |
|
| | ```python
|
| | def setup_caches(self, ...):
|
| | # ... existing cache setup ...
|
| | for m in self.modules():
|
| | if hasattr(m, "rope_init"):
|
| | m.rope_init()
|
| | m.to(device)
|
| | ```
|
| |
|
| | #### 3. OOM at Codec Decode (16 GB GPUs)
|
| |
|
| | Offload model to CPU before codec decode:
|
| |
|
| | ```python
|
| | self.model.cpu()
|
| | torch.cuda.empty_cache()
|
| | wav = self.audio_codec.detokenize(frames)
|
| | ```
|
| |
|
| | #### 4. torchcodec Missing (torchaudio >= 2.10)
|
| |
|
| | Replace torchaudio with soundfile:
|
| |
|
| | ```python
|
| | import soundfile as sf
|
| | sf.write(save_path, wav_np, 48000)
|
| | ```
|
| |
|
| | #### 5. 4-bit Quantization Loading
|
| |
|
| | ```python
|
| | from transformers import BitsAndBytesConfig
|
| |
|
| | bnb_config = BitsAndBytesConfig(
|
| | load_in_4bit=True,
|
| | bnb_4bit_compute_dtype=torch.bfloat16,
|
| | bnb_4bit_quant_type="nf4",
|
| | )
|
| |
|
| | model = HeartMuLa.from_pretrained(
|
| | "PavonicAI/HeartMuLa-3B-4bit",
|
| | quantization_config=bnb_config,
|
| | device_map="cuda:0",
|
| | ignore_mismatched_sizes=True,
|
| | )
|
| | ```
|
| |
|
| | ## Hardware Tested
|
| |
|
| | - NVIDIA RTX 5070 Ti (16 GB) with 4-bit quantization
|
| | - ~13 GB VRAM during generation, ~8 GB during encoding
|
| | - Stable for hours of continuous generation
|
| | - Output: 48kHz stereo audio
|
| |
|
| | ## Credits
|
| |
|
| | - Original model by [HeartMuLa Team](https://heartmula.github.io/) (Apache-2.0)
|
| | - Quantization, compatibility fixes & ComfyUI node by [ForgeAI / PavonicAI](https://github.com/PavonicAI)
|
| |
|
| | ## License
|
| |
|
| | Apache-2.0 (same as original)
|
| |
|