--- license: mit base_model: vibevoice/VibeVoice-7B language: - "no" - nb tags: - tts - text-to-speech - speech-synthesis - norwegian - bokmal - bitsandbytes - 4bit - quantized pipeline_tag: text-to-speech --- # Prat-9B-NF4 (preview) Also see [Prat 9B](https://huggingface.co/heiertech/Prat-9B) ## Quantization Details - **Method**: bitsandbytes NF4 (4-bit NormalFloat) - **Double quantization**: Enabled - **Compute dtype**: bfloat16 - **Model size**: ~6.2 GB (vs ~19 GB for bf16) - **VRAM usage**: ~7 GB ## Usage ```python import torch from transformers import BitsAndBytesConfig from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor # Load with 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", ) model = VibeVoiceForConditionalGenerationInference.from_pretrained( "heiertech/Prat-9B-NF4", quantization_config=bnb_config, device_map="auto", torch_dtype=torch.bfloat16, ) model.eval() model.set_ddpm_inference_steps(num_steps=10) processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4") # Generate Norwegian speech text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge." inputs = processor(text=[text], padding=True, return_tensors="pt", return_attention_mask=True) inputs = {k: v.to(model.device) for k, v in inputs.items() if torch.is_tensor(v)} with torch.no_grad(): outputs = model.generate( **inputs, cfg_scale=1.3, tokenizer=processor.tokenizer, generation_config={"do_sample": False}, ) audio = outputs.speech_outputs[0] # 24kHz audio ```