heiertech
/

Prat-9B-NF4

@@ -1,32 +1,26 @@
 ---
-license: apache-2.0
 language:
-- 'no'
-- en
 tags:
-- text-to-speech
 - tts
 - speech-synthesis
 - norwegian
-- vibevoice
 - bitsandbytes
 - 4bit
 - quantized
-datasets:
-- heiertech/vibevoice-norwegian-mcv
 pipeline_tag: text-to-speech
 ---
-# Prat-9b-nob (4-bit Quantized)
-A 4-bit quantized version of Prat-9b-nob fine-tuned for Norwegian text-to-speech synthesis.
-## Model Description
-This model is a bitsandbytes 4-bit (NF4) quantized version of [heiertech/Prat-9b-nob](https://huggingface.co/heiertech/Prat-9b-nob),
-which was fine-tuned from [vibevoice/VibeVoice-7b](https://huggingface.co/aoi-ot/VibeVoice-Large) on Norwegian speech data.
-### Quantization Details
 - **Method**: bitsandbytes NF4 (4-bit NormalFloat)
 - **Double quantization**: Enabled
@@ -36,43 +30,11 @@ which was fine-tuned from [vibevoice/VibeVoice-7b](https://huggingface.co/aoi-ot
 ## Training Details
-| Parameter | Value |
-|-----------|-------|
-| Base model | aoi-ot/VibeVoice-Large |
-| Dataset | heiertech/vibevoice-norwegian-mcv |
-| Training samples | 1,784 (43 speakers) |
-| Validation samples | 216 |
-| Training steps | 1,000 |
-| Epochs | ~2.24 |
-| Effective batch size | 4 (1 x 4 gradient accumulation) |
-| Optimizer | Adafactor |
-| Learning rate | 2.5e-4 |
-| LR scheduler | Cosine |
-| Warmup ratio | 3% |
-| Training time | ~33 minutes (RTX 3090) |
-### LoRA Configuration
-| Parameter | Value |
-|-----------|-------|
-| Rank (r) | 32 |
-| Alpha | 128 |
-| Dropout | 0.05 |
-| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
-### Loss Weights
-| Loss | Weight |
-|------|--------|
-| Diffusion loss | 1.4 |
-| Cross-entropy loss | 0.04 |
-| Voice prompt drop rate | 0.2 |
-### Training Metrics
-- **Initial loss**: 4.97 (step 10)
-- **Final loss**: 4.72
-- **Final train loss (avg)**: 5.33
 ## Usage
@@ -91,7 +53,7 @@ bnb_config = BitsAndBytesConfig(
 )
 model = VibeVoiceForConditionalGenerationInference.from_pretrained(
-    "heiertech/Prat-9b-nob-bnb-4bit",
     quantization_config=bnb_config,
     device_map="auto",
     torch_dtype=torch.bfloat16,
@@ -99,7 +61,7 @@ model = VibeVoiceForConditionalGenerationInference.from_pretrained(
 model.eval()
 model.set_ddpm_inference_steps(num_steps=10)
-processor = VibeVoiceProcessor.from_pretrained("heiertech/vibevoice-7b-nob-bnb-4bit")
 # Generate Norwegian speech
 text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
@@ -115,4 +77,13 @@ with torch.no_grad():
     )
 audio = outputs.speech_outputs[0]  # 24kHz audio
-```

 ---
+license: mit
+base_model: vibevoice/VibeVoice-7B
 language:
+- "no"
+- nb
 tags:
 - tts
+- text-to-speech
 - speech-synthesis
 - norwegian
+- bokmal
 - bitsandbytes
 - 4bit
 - quantized
 pipeline_tag: text-to-speech
 ---
+# Prat-9B-NF4
+A 4-bit (NF4) quantized Norwegian (Bokmal) text-to-speech model fine-tuned for the Ostnorsk/Oslo dialect.
+## Quantization Details
 - **Method**: bitsandbytes NF4 (4-bit NormalFloat)
 - **Double quantization**: Enabled
 ## Training Details
+This model was trained using a progressive 3-stage fine-tuning approach:
+1. **Stage 1**: Initial Norwegian (Bokmal) training on Mozilla Common Voice
+2. **Stage 2**: Continued training on broader Norwegian data
+3. **Stage 3**: Dialect-specific fine-tuning for Ostnorsk/Oslo dialect
 ## Usage
 )
 model = VibeVoiceForConditionalGenerationInference.from_pretrained(
+    "heiertech/Prat-9B-NF4",
     quantization_config=bnb_config,
     device_map="auto",
     torch_dtype=torch.bfloat16,
 model.eval()
 model.set_ddpm_inference_steps(num_steps=10)
+processor = VibeVoiceProcessor.from_pretrained("heiertech/Prat-9B-NF4")
 # Generate Norwegian speech
 text = "Speaker 0: Hei, jeg heter Maria og jeg kommer fra Norge."
     )
 audio = outputs.speech_outputs[0]  # 24kHz audio
+```
+## Base Model
+This model is a fine-tune of [VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B). Note that despite the name, VibeVoice-7B is actually a 9B parameter model.
+## Acknowledgments
+- Base model: [vibevoice/VibeVoice-7B](https://huggingface.co/vibevoice/VibeVoice-7B)
+- Training data: Mozilla Common Voice Norwegian