zeropointnine commited on
Commit
a7a054e
·
verified ·
1 Parent(s): bed8214

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -3
README.md CHANGED
@@ -1,3 +1,27 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/VibeVoice-1.5B
7
+ tags:
8
+ - lora
9
+ ---
10
+
11
+ LoRA for VibeVoice 1.5B
12
+
13
+ ### Source:
14
+
15
+ [Elizabeth Klett's narration of The House of the Vampire](https://librivox.org/the-house-of-the-vampire-by-george-sylvester-viereck/) (public domain) (MP3 128k)
16
+
17
+ ### Dataset prep/process
18
+
19
+ Segmentation and transcription of source audio done using [tts-dataset-generator](https://github.com/gokhaneraslan/tts-dataset-generator) (silence segmentation threshold 400ms; target samplerate 24K for VibeVoice). Some/more than some occurrences of "intra-sentence" segmentation. Audio clips normalized to -3dB. Cumulative duration 1h53m.
20
+
21
+ ### Training details
22
+
23
+ [VibeVoice-finetuning](https://github.com/voicepowered-ai/VibeVoice-finetuning)
24
+
25
+ ```
26
+ python -m src.finetune_vibevoice_lora --model_name_or_path microsoft/VibeVoice-1.5B --train_jsonl "path\to\metadata.jsonl" --text_column_name text --audio_column_name audio --output_dir "path\to\elizabeth_klett\lora" --per_device_train_batch_size 8 --gradient_accumulation_steps 4 --learning_rate 2.5e-5 --num_train_epochs 60 --logging_steps 10 --save_steps 200 --remove_unused_columns False --bf16 True --do_train --gradient_clipping --gradient_checkpointing False --ddpm_batch_mul 4 --diffusion_loss_weight 1.4 --train_diffusion_head True --ce_loss_weight 0.04 --voice_prompt_drop_rate 1 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj --lr_scheduler_type cosine --warmup_ratio 0.03 --max_grad_norm 0.8 --report_to tensorboard
27
+ ```