HeartMuLa 3B โ 4-bit NF4 Quantized
Pre-quantized 4-bit (NF4) checkpoint of HeartMuLa-oss-3B for 16 GB VRAM GPUs (RTX 4060 Ti, RTX 5070 Ti, etc.).
The Problem
The original HeartMuLa 3B model requires 15 GB VRAM in bfloat16. Together with HeartCodec (1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.
On top of that, the original code has several compatibility issues with modern PyTorch/transformers/torchtune versions (see fixes below).
What This Checkpoint Does
- 4-bit NF4 quantized HeartMuLa 3B (~4.9 GB instead of ~6 GB)
- Fits on 16 GB VRAM together with HeartCodec
- Works with PyTorch 2.4+, transformers 4.57+/5.x, torchtune 0.4+
ComfyUI Usage
This checkpoint works with the HeartMuLa ComfyUI custom nodes, but you need to apply the code fixes listed below to make it work with modern package versions.
Setup
Download this checkpoint into your ComfyUI models folder:
ComfyUI/models/HeartMuLa/HeartMuLa-4bit-3B/You still need the original HeartCodec and tokenizer from the original repo
Install required packages in ComfyUI's Python:
pip install bitsandbytes soundfile
Required Code Fixes
If you're using modern package versions (PyTorch 2.4+, transformers 5.x, torchtune 0.5+), you need these fixes in your heartlib code:
1. ignore_mismatched_sizes Error (transformers 5.x)
Add ignore_mismatched_sizes=True to ALL from_pretrained() calls in music_generation.py and lyrics_transcription.py:
# In music_generation.py - HeartCodec loading
HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)
# In music_generation.py - HeartMuLa loading
HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)
# In lyrics_transcription.py - Whisper loading
WhisperForConditionalGeneration.from_pretrained(..., ignore_mismatched_sizes=True)
2. RoPE cache is not built Error (torchtune >= 0.5)
In modeling_heartmula.py, add this to the setup_caches() method after the cache setup:
def setup_caches(self, ...):
# ... existing cache setup code ...
# ADD THIS: Initialize RoPE caches (required for torchtune >= 0.5)
for m in self.modules():
if hasattr(m, 'rope_init'):
m.rope_init()
m.to(device)
3. OOM at Codec Decode (16 GB GPUs)
In music_generation.py, offload the model to CPU before running HeartCodec:
# After generating frames, BEFORE codec decode:
frames = torch.stack(frames).permute(1, 2, 0).squeeze(0)
self.model.reset_caches()
self.model.cpu() # <-- ADD THIS
torch.cuda.empty_cache() # <-- ADD THIS
wav = self.audio_codec.detokenize(frames)
4. torchcodec Missing (torchaudio >= 2.10)
Replace torchaudio.save() and torchaudio.load() with soundfile:
# Instead of torchaudio.save():
import soundfile as sf
wav_np = wav.cpu().float().numpy()
if wav_np.ndim == 2:
wav_np = wav_np.T
sf.write(save_path, wav_np, 48000)
# Instead of torchaudio.load():
audio_data, sample_rate = sf.read(path, dtype='float32')
waveform = torch.from_numpy(audio_data)
5. 4-bit Quantization Loading
When loading this checkpoint, use device_map="cuda:0":
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
)
model = HeartMuLa.from_pretrained(
"PavonicAI/HeartMuLa-3B-4bit",
quantization_config=bnb_config,
device_map="cuda:0",
ignore_mismatched_sizes=True,
)
Requirements
torch >= 2.4with CUDAbitsandbytes >= 0.43transformers >= 4.57torchtune >= 0.4soundfile- HeartCodec + tokenizer weights from original HeartMuLa repo
Hardware Tested
- NVIDIA RTX 5070 Ti (16 GB) โ works with 4-bit quantization + CPU offload during codec decode
- Output: 48kHz WAV audio
Credits
- Original model by HeartMuLa Team (Apache-2.0)
- Quantization & compatibility fixes by ForgeAI / PavonicAI
License
Apache-2.0 (same as original)
- Downloads last month
- 20
Model tree for PavonicAI/HeartMuLa-3B-4bit
Base model
HeartMuLa/HeartMuLa-oss-3B