ikk-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4194

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.5015 5.5263 1000 0.4673
0.4625 11.0499 2000 0.4486
0.461 16.5762 3000 0.4411
0.4464 22.0997 4000 0.4362
0.4339 27.6260 5000 0.4300
0.4169 33.1496 6000 0.4260
0.4406 38.6759 7000 0.4280
0.4204 44.1994 8000 0.4255
0.41 49.7258 9000 0.4222
0.4069 55.2493 10000 0.4218
0.3948 60.7756 11000 0.4251
0.3915 66.2992 12000 0.4190
0.3923 71.8255 13000 0.4221
0.4038 77.3490 14000 0.4224
0.3932 82.8753 15000 0.4181
0.3805 88.3989 16000 0.4193
0.3862 93.9252 17000 0.4188
0.3864 99.4488 18000 0.4187
0.3748 104.9751 19000 0.4190
0.3735 110.4986 20000 0.4192
0.3736 116.0222 21000 0.4174
0.3736 121.5485 22000 0.4182
0.3725 127.0720 23000 0.4187
0.3669 132.5983 24000 0.4185
0.367 138.1219 25000 0.4157
0.3694 143.6482 26000 0.4191
0.3632 149.1717 27000 0.4180
0.3607 154.6981 28000 0.4177
0.361 160.2216 29000 0.4164
0.3612 165.7479 30000 0.4168
0.3618 171.2715 31000 0.4192
0.3565 176.7978 32000 0.4175
0.362 182.3213 33000 0.4184
0.3567 187.8476 34000 0.4181
0.3545 193.3712 35000 0.4183
0.3592 198.8975 36000 0.4197
0.3524 204.4211 37000 0.4199
0.3521 209.9474 38000 0.4192
0.3625 215.4709 39000 0.4187
0.3546 220.9972 40000 0.4194

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/ikk-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1386)
this model