zmb-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0800

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.0625 40.0 1000 0.0545
0.055 80.0 2000 0.0515
0.0476 120.0 3000 0.0528
0.0464 160.0 4000 0.0543
0.0448 200.0 5000 0.0574
0.0417 240.0 6000 0.0585
0.0367 280.0 7000 0.0590
0.0417 320.0 8000 0.0628
0.0344 360.0 9000 0.0627
0.0334 400.0 10000 0.0657
0.0311 440.0 11000 0.0672
0.031 480.0 12000 0.0685
0.0309 520.0 13000 0.0691
0.0285 560.0 14000 0.0699
0.0313 600.0 15000 0.0720
0.0275 640.0 16000 0.0720
0.0283 680.0 17000 0.0724
0.0281 720.0 18000 0.0727
0.0276 760.0 19000 0.0758
0.0256 800.0 20000 0.0741
0.0267 840.0 21000 0.0741
0.0287 880.0 22000 0.0763
0.0261 920.0 23000 0.0761
0.0246 960.0 24000 0.0776
0.0229 1000.0 25000 0.0778
0.0241 1040.0 26000 0.0784
0.0233 1080.0 27000 0.0794
0.0233 1120.0 28000 0.0785
0.0237 1160.0 29000 0.0787
0.0235 1200.0 30000 0.0791
0.0227 1240.0 31000 0.0799
0.0217 1280.0 32000 0.0805
0.0228 1320.0 33000 0.0804
0.0242 1360.0 34000 0.0798
0.0238 1400.0 35000 0.0803
0.0229 1440.0 36000 0.0804
0.0229 1480.0 37000 0.0803
0.0235 1520.0 38000 0.0802
0.0232 1560.0 39000 0.0798
0.0224 1600.0 40000 0.0800

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.2
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/zmb-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1346)
this model