marksverdhei commited on
Commit
454573a
·
verified ·
1 Parent(s): 94be6ff

docs: update README - rebrand to Prat-9B-NF4, MIT license, remove LoRA details, fix links

Browse files
Files changed (1) hide show
  1. README.md +26 -55
README.md CHANGED
@@ -1,32 +1,26 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
- - 'no'
5
- - en
6
  tags:
7
- - text-to-speech
8
  - tts
 
9
  - speech-synthesis
10
  - norwegian
11
- - vibevoice
12
  - bitsandbytes
13
  - 4bit
14
  - quantized
15
- datasets:
16
- - heiertech/vibevoice-norwegian-mcv
17
  pipeline_tag: text-to-speech
18
  ---
19
 
20
- # Prat-9b-nob (4-bit Quantized)
21
-
22
- A 4-bit quantized version of Prat-9b-nob fine-tuned for Norwegian text-to-speech synthesis.
23
 
24
- ## Model Description
25
 
26
- This model is a bitsandbytes 4-bit (NF4) quantized version of [heiertech/Prat-9b-nob](https://huggingface.co/heiertech/Prat-9b-nob),
27
- which was fine-tuned from [vibevoice/VibeVoice-7b](https://huggingface.co/aoi-ot/VibeVoice-Large) on Norwegian speech data.
28
-
29
- ### Quantization Details
30
 
31
  - **Method**: bitsandbytes NF4 (4-bit NormalFloat)
32
  - **Double quantization**: Enabled
@@ -36,43 +30,11 @@ which was fine-tuned from [vibevoice/VibeVoice-7b](https://huggingface.co/aoi-ot
36
 
37
  ## Training Details
38
 
39
- | Parameter | Value |
40
- |-----------|-------|
41
- | Base model | aoi-ot/VibeVoice-Large |
42
- | Dataset | heiertech/vibevoice-norwegian-mcv |
43
- | Training samples | 1,784 (43 speakers) |
44
- | Validation samples | 216 |
45
- | Training steps | 1,000 |
46
- | Epochs | ~2.24 |
47
- | Effective batch size | 4 (1 x 4 gradient accumulation) |
48
- | Optimizer | Adafactor |
49
- | Learning rate | 2.5e-4 |
50
- | LR scheduler | Cosine |
51
- | Warmup ratio | 3% |
52
- | Training time | ~33 minutes (RTX 3090) |
53
-
54
- ### LoRA Configuration
55
-
56
- | Parameter | Value |
57
- |-----------|-------|
58
- | Rank (r) | 32 |
59
- | Alpha | 128 |
60
- | Dropout | 0.05 |
61
- | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
62
-
63
- ### Loss Weights
64
-
65
- | Loss | Weight |
66
- |------|--------|
67
- | Diffusion loss | 1.4 |
68
- | Cross-entropy loss | 0.04 |
69
- | Voice prompt drop rate | 0.2 |
70
-
71
- ### Training Metrics
72
-
73
- - **Initial loss**: 4.97 (step 10)
74
- - **Final loss**: 4.72
75
- - **Final train loss (avg)**: 5.33
76
 
77
  ## Usage
78
 
@@ -91,7 +53,7 @@ bnb_config = BitsAndBytesConfig(
91
  )
92
 
93
  model = VibeVoiceForConditionalGenerationInference.from_pretrained(
94
- "heiertech/Prat-9b-nob-bnb-4bit",
95
  quantization_config=bnb_config,
96
  device_map="auto",
97
  torch_dtype=torch.bfloat16,
@@ -99,7 +61,7 @@ model = VibeVoiceForConditionalGenerationInference.from_pretrained(
99
  model.eval()
100
  model.set_ddpm_inference_steps(num_steps=10)
101
 
102
- processor = VibeVoiceProcessor.from_pretrained("heiertech/vibevoice-7b-nob-bnb-4bit")
103
 
104
  # Generate Norwegian speech
105
  text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
@@ -115,4 +77,13 @@ with torch.no_grad():
115
  )
116
 
117
  audio = outputs.speech_outputs[0] # 24kHz audio
118
- ```
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ base_model: vibevoice/VibeVoice-7B
4
  language:
5
+ - "no"
6
+ - nb
7
  tags:
 
8
  - tts
9
+ - text-to-speech
10
  - speech-synthesis
11
  - norwegian
12
+ - bokmal
13
  - bitsandbytes
14
  - 4bit
15
  - quantized
 
 
16
  pipeline_tag: text-to-speech
17
  ---
18
 
19
+ # Prat-9B-NF4
 
 
20
 
21
+ A 4-bit (NF4) quantized Norwegian (Bokmal) text-to-speech model fine-tuned for the Ostnorsk/Oslo dialect.
22
 
23
+ ## Quantization Details
 
 
 
24
 
25
  - **Method**: bitsandbytes NF4 (4-bit NormalFloat)
26
  - **Double quantization**: Enabled
 
30
 
31
  ## Training Details
32
 
33
+ This model was trained using a progressive 3-stage fine-tuning approach:
34
+
35
+ 1. **Stage 1**: Initial Norwegian (Bokmal) training on Mozilla Common Voice
36
+ 2. **Stage 2**: Continued training on broader Norwegian data
37
+ 3. **Stage 3**: Dialect-specific fine-tuning for Ostnorsk/Oslo dialect
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Usage
40
 
 
53
  )
54
 
55
  model = VibeVoiceForConditionalGenerationInference.from_pretrained(
56
+ "heiertech/Prat-9B-NF4",
57
  quantization_config=bnb_config,
58
  device_map="auto",
59
  torch_dtype=torch.bfloat16,
 
61
  model.eval()
62
  model.set_ddpm_inference_steps(num_steps=10)
63
 
64
+ processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4")
65
 
66
  # Generate Norwegian speech
67
  text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
 
77
  )
78
 
79
  audio = outputs.speech_outputs[0] # 24kHz audio
80
+ ```
81
+
82
+ ## Base Model
83
+
84
+ This model is a fine-tune of [VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B). Note that despite the name, VibeVoice-7B is actually a 9B parameter model.
85
+
86
+ ## Acknowledgments
87
+
88
+ - Base model: [vibevoice/VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B)
89
+ - Training data: Mozilla Common Voice Norwegian