vibevoice-community
/

klett

Model card Files Files and versions

zeropointnine commited on Jan 22

Commit

a7a054e

·

verified ·

1 Parent(s): bed8214

Update README.md

Files changed (1) hide show

README.md +27 -3

README.md CHANGED Viewed

@@ -1,3 +1,27 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/VibeVoice-1.5B
+tags:
+- lora
+---
+LoRA for VibeVoice 1.5B
+### Source:
+[Elizabeth Klett's narration of The House of the Vampire](https://librivox.org/the-house-of-the-vampire-by-george-sylvester-viereck/) (public domain) (MP3 128k)
+### Dataset prep/process
+Segmentation and transcription of source audio done using [tts-dataset-generator](https://github.com/gokhaneraslan/tts-dataset-generator) (silence segmentation threshold 400ms; target samplerate 24K for VibeVoice). Some/more than some occurrences of "intra-sentence" segmentation. Audio clips normalized to -3dB. Cumulative duration 1h53m.
+### Training details
+[VibeVoice-finetuning](https://github.com/voicepowered-ai/VibeVoice-finetuning)
+```
+python -m src.finetune_vibevoice_lora --model_name_or_path microsoft/VibeVoice-1.5B --train_jsonl "path\to\metadata.jsonl" --text_column_name text --audio_column_name audio --output_dir "path\to\elizabeth_klett\lora" --per_device_train_batch_size 8 --gradient_accumulation_steps 4 --learning_rate 2.5e-5 --num_train_epochs 60 --logging_steps 10 --save_steps 200 --remove_unused_columns False --bf16 True --do_train --gradient_clipping --gradient_checkpointing False --ddpm_batch_mul 4 --diffusion_loss_weight 1.4 --train_diffusion_head True --ce_loss_weight 0.04 --voice_prompt_drop_rate 1 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj --lr_scheduler_type cosine --warmup_ratio 0.03 --max_grad_norm 0.8 --report_to tensorboard
+```