File size: 1,230 Bytes
7e3e3c2 3d817e3 7e3e3c2 eb91083 7e3e3c2 eb91083 7e3e3c2 3d817e3 7e3e3c2 3d817e3 7e3e3c2 eb91083 7e3e3c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
license: mit
base_model: vibevoice/VibeVoice-7B
tags:
- tts
- text-to-speech
- speech-synthesis
- norwegian
- bokmal
language:
- "no"
- nb
---
# Prat-9B (preview)
A Norwegian (Bokmal) text-to-speech model fine-tuned for the Østnorsk/Oslo dialect.
This model is currently in preview, You can expect things like weird artefacts,
But generally, per our testing, it outperforms VibeVoice 7B per our unscientific qualitative eval.
## Usage
```python
from transformers import AutoProcessor, AutoModel
import torch
processor = AutoProcessor.from_pretrained("heiertech/Prat-9B")
model = AutoModel.from_pretrained("heiertech/Prat-9B", torch_dtype=torch.bfloat16)
# Generate speech
text = "Hei, dette er en test av den norske stemmen."
inputs = processor(text=text, return_tensors="pt")
outputs = model.generate(**inputs)
```
## Base Model
This model is based on [VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B).
Note that despite the name, VibeVoice-7B is actually a 9B parameter model.
The 7B only refers to the size of the llm backbone based on Qwen2.5 7B
## Acknowledgments
- Base model: [vibevoice/VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B)
- Training data: Mozilla Common Voice Norwegian
|