VibeVoice-1.5B - NF4 Quantized

4-bit NF4 quantization of microsoft/VibeVoice-1.5B.

Strategy

Backbone extraction approach:

  1. Downloaded raw safetensors (bypassed from_pretrained)
  2. Separated Qwen2.5-1.5B backbone from audio heads
  3. Quantized backbone as standard Qwen2ForCausalLM with NF4 + double quant
  4. Packaged quantized backbone + BF16 audio heads
Component Method Size
LLM backbone (Qwen2.5-1.5B) NF4 + double quant ~0.8-1.0 GB
Audio heads (tokenizers, diffusion, connectors) BF16 ~1.8 GB

Source

Quantized from microsoft/VibeVoice-1.5B (MIT license).

Downloads last month
9
Safetensors
Model size
2B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for tantk/vibevoice-1.5b-bnb-4bit

Quantized
(6)
this model