File size: 1,230 Bytes
7e3e3c2
3d817e3
7e3e3c2
 
 
 
 
 
 
 
 
 
 
 
eb91083
7e3e3c2
eb91083
 
 
7e3e3c2
 
 
 
 
 
 
3d817e3
 
7e3e3c2
 
 
 
 
 
 
3d817e3
7e3e3c2
eb91083
 
 
7e3e3c2
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
base_model: vibevoice/VibeVoice-7B
tags:
- tts
- text-to-speech
- speech-synthesis
- norwegian
- bokmal
language:
- "no"
- nb
---

# Prat-9B (preview)

A Norwegian (Bokmal) text-to-speech model fine-tuned for the Østnorsk/Oslo dialect.
This model is currently in preview, You can expect things like weird artefacts,
But generally, per our testing, it outperforms VibeVoice 7B per our unscientific qualitative eval.

## Usage

```python
from transformers import AutoProcessor, AutoModel
import torch

processor = AutoProcessor.from_pretrained("heiertech/Prat-9B")
model = AutoModel.from_pretrained("heiertech/Prat-9B", torch_dtype=torch.bfloat16)

# Generate speech
text = "Hei, dette er en test av den norske stemmen."
inputs = processor(text=text, return_tensors="pt")
outputs = model.generate(**inputs)
```

## Base Model

This model is based on [VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B).
Note that despite the name, VibeVoice-7B is actually a 9B parameter model.
The 7B only refers to the size of the llm backbone based on Qwen2.5 7B

## Acknowledgments

- Base model: [vibevoice/VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B)
- Training data: Mozilla Common Voice Norwegian