VoxCPM2 Bedtime Story LoRA
LoRA adapter fine-tuned on VoxCPM2 for bedtime story narration in English.
Training
- Base:
openbmb/VoxCPM2 - Dataset: LJSpeech (6,550 clips, ~12h β half-data experiment)
- Method: LoRA (r=32, alpha=64, targeting DiT attention layers)
- Best: exp6 (r=32, lr=5e-5, half data) β val loss 0.872
- GPU: A100-80GB via Modal
Evaluation (6 experiments)
| Exp | Config | Train Loss | Val Loss |
|---|---|---|---|
| exp1 | r32, lr=1e-4 | 0.823 | 0.918 |
| exp2 | r16, lr=5e-5 | 0.765 | 0.899 |
| exp3 | r64, lr=1e-4 | 0.838 | 0.908 |
| exp4 | r32, lr=5e-5, 200 steps | 0.803 | 0.884 |
| exp5 | r32, lr=2e-4 | 0.832 | 0.896 |
| exp6 | r32, lr=5e-5, half data | 0.935 | 0.872 |
Demo Audio
English bedtime stories (VoxCPM2)
| Mood | Audio |
|---|---|
| Magical | english_magical_mid.wav |
| Funny | english_funny_high.wav |
| Calming | english_calming_low.wav |
| Dreamy | english_dreamy_low.wav |
LoRA vs Stock comparison
| Type | Audio |
|---|---|
| Stock VoxCPM2 | comparison_stock_0.wav |
| LoRA exp6 | comparison_lora_exp6_0.wav |
Reference voice
- reference_voice.wav β voice used for cloning
Files
lora_weights.safetensorsβ LoRA adapter weights (12.8 MB)lora_config.jsonβ LoRA configurationtraining_state.jsonβ Training state
Part of DreamVoice
DreamVoice β bedtime stories in a parent's cloned voice.
Model tree for sush0401/VoxCPM2-bedtime-lora
Base model
openbmb/VoxCPM2