HeartMuLa
/

HeartMuLa-oss-3B

Model card Files Files and versions

Pre-quantized 4-bit checkpoint + ComfyUI Node for 16GB GPUs

#6

by PavonicDev - opened 1 day ago

1 day ago

Hey everyone! 👋

We got HeartMuLa-oss-3B running on 16 GB consumer GPUs (tested on RTX 5070 Ti) with bitsandbytes 4-bit NF4 quantization.

Along the way we fixed several compatibility issues with current library versions and packaged everything into a ready-to-use solution:

🔥 Pre-quantized 4-bit Checkpoint

PavonicAI/HeartMuLa-3B-4bit

Pre-quantized (NF4), loads in seconds instead of quantizing on-the-fly
~4.87 GB instead of ~12 GB
Runs on 16 GB VRAM GPUs

🎛️ ComfyUI Custom Node (All-in-One)

GitHub: PavonicAI/ForgeAI-HeartMuLa

All-in-one music generation node (lyrics + tags → audio file)
WAV and MP3 export
Built-in quantization selection (4bit / 8bit / none)
Lyrics transcriber node included

🔧 Compatibility Fixes Included

All fixes are baked into the ComfyUI node, no manual patching needed:

transformers 5.x: for
torchtune >= 0.5 RoPE fix: Monkey-patch to call
OOM fix: offload before codec decoding
torchaudio/torchcodec: Replaced with (no torchcodec dependency)
bitsandbytes 4-bit: Full NF4 quantization support

Hardware Tested

RTX 5070 Ti (16 GB) — works perfectly at 4-bit
~10 it/s generation speed, ~76 seconds for 60s of audio

Hope this helps others who want to run HeartMuLa on consumer hardware! 🚀

Made with ❤️ by ForgeAI / PavonicAI

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment