Prat-9B-NF4 / README.md
marksverdhei's picture
Update README.md
6ca29bb verified
metadata
license: mit
base_model: vibevoice/VibeVoice-7B
language:
  - 'no'
  - nb
tags:
  - tts
  - text-to-speech
  - speech-synthesis
  - norwegian
  - bokmal
  - bitsandbytes
  - 4bit
  - quantized
pipeline_tag: text-to-speech

Prat-9B-NF4 (preview)

Also see Prat 9B

Quantization Details

  • Method: bitsandbytes NF4 (4-bit NormalFloat)
  • Double quantization: Enabled
  • Compute dtype: bfloat16
  • Model size: ~6.2 GB (vs ~19 GB for bf16)
  • VRAM usage: ~7 GB

Usage

import torch
from transformers import BitsAndBytesConfig
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor

# Load with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    "heiertech/Prat-9B-NF4",
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
model.eval()
model.set_ddpm_inference_steps(num_steps=10)

processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4")

# Generate Norwegian speech
text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
inputs = processor(text=[text], padding=True, return_tensors="pt", return_attention_mask=True)
inputs = {k: v.to(model.device) for k, v in inputs.items() if torch.is_tensor(v)}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        cfg_scale=1.3,
        tokenizer=processor.tokenizer,
        generation_config={"do_sample": False},
    )

audio = outputs.speech_outputs[0]  # 24kHz audio