|
|
--- |
|
|
license: mit |
|
|
base_model: vibevoice/VibeVoice-7B |
|
|
language: |
|
|
- "no" |
|
|
- nb |
|
|
tags: |
|
|
- tts |
|
|
- text-to-speech |
|
|
- speech-synthesis |
|
|
- norwegian |
|
|
- bokmal |
|
|
- bitsandbytes |
|
|
- 4bit |
|
|
- quantized |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
# Prat-9B-NF4 (preview) |
|
|
|
|
|
Also see [Prat 9B](https://huggingface.co/heiertech/Prat-9B) |
|
|
|
|
|
## Quantization Details |
|
|
|
|
|
- **Method**: bitsandbytes NF4 (4-bit NormalFloat) |
|
|
- **Double quantization**: Enabled |
|
|
- **Compute dtype**: bfloat16 |
|
|
- **Model size**: ~6.2 GB (vs ~19 GB for bf16) |
|
|
- **VRAM usage**: ~7 GB |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import BitsAndBytesConfig |
|
|
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference |
|
|
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor |
|
|
|
|
|
# Load with 4-bit quantization |
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
) |
|
|
|
|
|
model = VibeVoiceForConditionalGenerationInference.from_pretrained( |
|
|
"heiertech/Prat-9B-NF4", |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto", |
|
|
torch_dtype=torch.bfloat16, |
|
|
) |
|
|
model.eval() |
|
|
model.set_ddpm_inference_steps(num_steps=10) |
|
|
|
|
|
processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4") |
|
|
|
|
|
# Generate Norwegian speech |
|
|
text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge." |
|
|
inputs = processor(text=[text], padding=True, return_tensors="pt", return_attention_mask=True) |
|
|
inputs = {k: v.to(model.device) for k, v in inputs.items() if torch.is_tensor(v)} |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
cfg_scale=1.3, |
|
|
tokenizer=processor.tokenizer, |
|
|
generation_config={"do_sample": False}, |
|
|
) |
|
|
|
|
|
audio = outputs.speech_outputs[0] # 24kHz audio |
|
|
``` |