File size: 1,795 Bytes
42bb1dc 454573a 42bb1dc 454573a 42bb1dc 454573a 42bb1dc 454573a 42bb1dc 6ca29bb 42bb1dc 6ca29bb 42bb1dc 454573a 42bb1dc 454573a 42bb1dc 454573a 42bb1dc 6ca29bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: mit
base_model: vibevoice/VibeVoice-7B
language:
- "no"
- nb
tags:
- tts
- text-to-speech
- speech-synthesis
- norwegian
- bokmal
- bitsandbytes
- 4bit
- quantized
pipeline_tag: text-to-speech
---
# Prat-9B-NF4 (preview)
Also see [Prat 9B](https://huggingface.co/heiertech/Prat-9B)
## Quantization Details
- **Method**: bitsandbytes NF4 (4-bit NormalFloat)
- **Double quantization**: Enabled
- **Compute dtype**: bfloat16
- **Model size**: ~6.2 GB (vs ~19 GB for bf16)
- **VRAM usage**: ~7 GB
## Usage
```python
import torch
from transformers import BitsAndBytesConfig
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor
# Load with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
"heiertech/Prat-9B-NF4",
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
)
model.eval()
model.set_ddpm_inference_steps(num_steps=10)
processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4")
# Generate Norwegian speech
text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
inputs = processor(text=[text], padding=True, return_tensors="pt", return_attention_mask=True)
inputs = {k: v.to(model.device) for k, v in inputs.items() if torch.is_tensor(v)}
with torch.no_grad():
outputs = model.generate(
**inputs,
cfg_scale=1.3,
tokenizer=processor.tokenizer,
generation_config={"do_sample": False},
)
audio = outputs.speech_outputs[0] # 24kHz audio
``` |