|
|
--- |
|
|
license: mit |
|
|
base_model: vibevoice/VibeVoice-7B |
|
|
tags: |
|
|
- tts |
|
|
- text-to-speech |
|
|
- speech-synthesis |
|
|
- norwegian |
|
|
- bokmal |
|
|
language: |
|
|
- "no" |
|
|
- nb |
|
|
--- |
|
|
|
|
|
# Prat-9B (preview) |
|
|
|
|
|
A Norwegian (Bokmal) text-to-speech model fine-tuned for the Østnorsk/Oslo dialect. |
|
|
This model is currently in preview, You can expect things like weird artefacts, |
|
|
But generally, per our testing, it outperforms VibeVoice 7B per our unscientific qualitative eval. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModel |
|
|
import torch |
|
|
|
|
|
processor = AutoProcessor.from_pretrained("heiertech/Prat-9B") |
|
|
model = AutoModel.from_pretrained("heiertech/Prat-9B", torch_dtype=torch.bfloat16) |
|
|
|
|
|
# Generate speech |
|
|
text = "Hei, dette er en test av den norske stemmen." |
|
|
inputs = processor(text=text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs) |
|
|
``` |
|
|
|
|
|
## Base Model |
|
|
|
|
|
This model is based on [VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B). |
|
|
Note that despite the name, VibeVoice-7B is actually a 9B parameter model. |
|
|
The 7B only refers to the size of the llm backbone based on Qwen2.5 7B |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: [vibevoice/VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B) |
|
|
- Training data: Mozilla Common Voice Norwegian |
|
|
|