Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- microsoft/VibeVoice-1.5B
|
| 7 |
+
tags:
|
| 8 |
+
- lora
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
LoRA for VibeVoice 1.5B
|
| 12 |
+
|
| 13 |
+
### Source:
|
| 14 |
+
|
| 15 |
+
[Elizabeth Klett's narration of The House of the Vampire](https://librivox.org/the-house-of-the-vampire-by-george-sylvester-viereck/) (public domain) (MP3 128k)
|
| 16 |
+
|
| 17 |
+
### Dataset prep/process
|
| 18 |
+
|
| 19 |
+
Segmentation and transcription of source audio done using [tts-dataset-generator](https://github.com/gokhaneraslan/tts-dataset-generator) (silence segmentation threshold 400ms; target samplerate 24K for VibeVoice). Some/more than some occurrences of "intra-sentence" segmentation. Audio clips normalized to -3dB. Cumulative duration 1h53m.
|
| 20 |
+
|
| 21 |
+
### Training details
|
| 22 |
+
|
| 23 |
+
[VibeVoice-finetuning](https://github.com/voicepowered-ai/VibeVoice-finetuning)
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
python -m src.finetune_vibevoice_lora --model_name_or_path microsoft/VibeVoice-1.5B --train_jsonl "path\to\metadata.jsonl" --text_column_name text --audio_column_name audio --output_dir "path\to\elizabeth_klett\lora" --per_device_train_batch_size 8 --gradient_accumulation_steps 4 --learning_rate 2.5e-5 --num_train_epochs 60 --logging_steps 10 --save_steps 200 --remove_unused_columns False --bf16 True --do_train --gradient_clipping --gradient_checkpointing False --ddpm_batch_mul 4 --diffusion_loss_weight 1.4 --train_diffusion_head True --ce_loss_weight 0.04 --voice_prompt_drop_rate 1 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj --lr_scheduler_type cosine --warmup_ratio 0.03 --max_grad_norm 0.8 --report_to tensorboard
|
| 27 |
+
```
|