Pre-quantized 4-bit checkpoint + ComfyUI Node for 16GB GPUs
#7
by
PavonicDev - opened
Hey everyone!
We got HeartMuLa-oss-3B running on 16 GB consumer GPUs (tested on RTX 5070 Ti) with bitsandbytes 4-bit NF4 quantization.
Along the way we fixed several compatibility issues with current library versions and packaged everything into a ready-to-use solution:
Pre-quantized 4-bit Checkpoint
- Pre-quantized (NF4), loads in seconds instead of quantizing on-the-fly
- ~4.87 GB instead of ~12 GB
- Runs on 16 GB VRAM GPUs
ComfyUI Custom Node (All-in-One)
GitHub: PavonicAI/ForgeAI-HeartMuLa
- All-in-one music generation node (lyrics + tags -> audio file)
- WAV and MP3 export
- Built-in quantization selection (4bit / 8bit / none)
- Lyrics transcriber node included
Compatibility Fixes Included
All fixes are baked into the ComfyUI node, no manual patching needed:
- transformers 5.x:
ignore_mismatched_sizes=Trueforfrom_pretrained() - torchtune >= 0.5 RoPE fix: Monkey-patch
setup_caches()to callrope_init() - OOM fix:
model.cpu()offload before codec decoding - torchaudio/torchcodec: Replaced with
soundfile(no torchcodec dependency) - bitsandbytes 4-bit: Full NF4 quantization support
Hardware Tested
- RTX 5070 Ti (16 GB) - works perfectly at 4-bit
- ~10 it/s generation speed, ~76 seconds for 60s of audio
Hope this helps others who want to run HeartMuLa on consumer hardware!
Made by ForgeAI / PavonicAI
PavonicDev changed discussion status to
closed