bap-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3966

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4925 11.7647 1000 0.4308
0.443 23.5294 2000 0.4077
0.4186 35.2941 3000 0.3994
0.4148 47.0588 4000 0.3940
0.3967 58.8235 5000 0.3910
0.3953 70.5882 6000 0.3907
0.3835 82.3529 7000 0.3913
0.3715 94.1176 8000 0.3893
0.3778 105.8824 9000 0.3890
0.3706 117.6471 10000 0.3880
0.3572 129.4118 11000 0.3868
0.3658 141.1765 12000 0.3885
0.3581 152.9412 13000 0.3893
0.3564 164.7059 14000 0.3904
0.3495 176.4706 15000 0.3879
0.3535 188.2353 16000 0.3884
0.3495 200.0 17000 0.3890
0.3593 211.7647 18000 0.3907
0.345 223.5294 19000 0.3910
0.3464 235.2941 20000 0.3909
0.3363 247.0588 21000 0.3920
0.3422 258.8235 22000 0.3917
0.3383 270.5882 23000 0.3930
0.3364 282.3529 24000 0.3927
0.3334 294.1176 25000 0.3937
0.3337 305.8824 26000 0.3943
0.3251 317.6471 27000 0.3932
0.3247 329.4118 28000 0.3944
0.3294 341.1765 29000 0.3951
0.3293 352.9412 30000 0.3954
0.3455 364.7059 31000 0.3978
0.3229 376.4706 32000 0.3962
0.3201 388.2353 33000 0.3966
0.3249 400.0 34000 0.3969
0.3203 411.7647 35000 0.3965
0.3173 423.5294 36000 0.3964
0.3223 435.2941 37000 0.3968
0.3183 447.0588 38000 0.3972
0.3218 458.8235 39000 0.3967
0.3262 470.5882 40000 0.3966

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1387)
this model