HeartMuLa 3B โ€” 4-bit NF4 Quantized

Pre-quantized 4-bit (NF4) checkpoint of HeartMuLa-oss-3B for 16 GB VRAM GPUs (RTX 4060 Ti, RTX 5070 Ti, etc.).

The Problem

The original HeartMuLa 3B model requires 15 GB VRAM in bfloat16. Together with HeartCodec (1.5 GB), it exceeds 16 GB VRAM, making it impossible to run on consumer GPUs like RTX 4060 Ti, RTX 5070 Ti, etc.

On top of that, the original code has several compatibility issues with modern PyTorch/transformers/torchtune versions (see fixes below).

What This Checkpoint Does

  • 4-bit NF4 quantized HeartMuLa 3B (~4.9 GB instead of ~6 GB)
  • Fits on 16 GB VRAM together with HeartCodec
  • Works with PyTorch 2.4+, transformers 4.57+/5.x, torchtune 0.4+

ComfyUI Usage

This checkpoint works with the HeartMuLa ComfyUI custom nodes, but you need to apply the code fixes listed below to make it work with modern package versions.

Setup

  1. Download this checkpoint into your ComfyUI models folder:

    ComfyUI/models/HeartMuLa/HeartMuLa-4bit-3B/
    
  2. You still need the original HeartCodec and tokenizer from the original repo

  3. Install required packages in ComfyUI's Python:

    pip install bitsandbytes soundfile
    

Required Code Fixes

If you're using modern package versions (PyTorch 2.4+, transformers 5.x, torchtune 0.5+), you need these fixes in your heartlib code:

1. ignore_mismatched_sizes Error (transformers 5.x)

Add ignore_mismatched_sizes=True to ALL from_pretrained() calls in music_generation.py and lyrics_transcription.py:

# In music_generation.py - HeartCodec loading
HeartCodec.from_pretrained(..., ignore_mismatched_sizes=True)

# In music_generation.py - HeartMuLa loading
HeartMuLa.from_pretrained(..., ignore_mismatched_sizes=True)

# In lyrics_transcription.py - Whisper loading
WhisperForConditionalGeneration.from_pretrained(..., ignore_mismatched_sizes=True)

2. RoPE cache is not built Error (torchtune >= 0.5)

In modeling_heartmula.py, add this to the setup_caches() method after the cache setup:

def setup_caches(self, ...):
    # ... existing cache setup code ...

    # ADD THIS: Initialize RoPE caches (required for torchtune >= 0.5)
    for m in self.modules():
        if hasattr(m, 'rope_init'):
            m.rope_init()
            m.to(device)

3. OOM at Codec Decode (16 GB GPUs)

In music_generation.py, offload the model to CPU before running HeartCodec:

# After generating frames, BEFORE codec decode:
frames = torch.stack(frames).permute(1, 2, 0).squeeze(0)
self.model.reset_caches()
self.model.cpu()           # <-- ADD THIS
torch.cuda.empty_cache()   # <-- ADD THIS
wav = self.audio_codec.detokenize(frames)

4. torchcodec Missing (torchaudio >= 2.10)

Replace torchaudio.save() and torchaudio.load() with soundfile:

# Instead of torchaudio.save():
import soundfile as sf
wav_np = wav.cpu().float().numpy()
if wav_np.ndim == 2:
    wav_np = wav_np.T
sf.write(save_path, wav_np, 48000)

# Instead of torchaudio.load():
audio_data, sample_rate = sf.read(path, dtype='float32')
waveform = torch.from_numpy(audio_data)

5. 4-bit Quantization Loading

When loading this checkpoint, use device_map="cuda:0":

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
)

model = HeartMuLa.from_pretrained(
    "PavonicAI/HeartMuLa-3B-4bit",
    quantization_config=bnb_config,
    device_map="cuda:0",
    ignore_mismatched_sizes=True,
)

Requirements

  • torch >= 2.4 with CUDA
  • bitsandbytes >= 0.43
  • transformers >= 4.57
  • torchtune >= 0.4
  • soundfile
  • HeartCodec + tokenizer weights from original HeartMuLa repo

Hardware Tested

  • NVIDIA RTX 5070 Ti (16 GB) โ€” works with 4-bit quantization + CPU offload during codec decode
  • Output: 48kHz WAV audio

Credits

  • Original model by HeartMuLa Team (Apache-2.0)
  • Quantization & compatibility fixes by ForgeAI / PavonicAI

License

Apache-2.0 (same as original)

Downloads last month
20
Safetensors
Model size
4B params
Tensor type
F32
ยท
U8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for PavonicAI/HeartMuLa-3B-4bit

Quantized
(1)
this model