SpeechT5_TTS_Hataw

This model is a fine-tuned version of microsoft/speecht5_tts on the HatawTTS dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 10000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5705	0.3814	250	0.4276
0.4671	0.7628	500	0.4159
0.451	1.1434	750	0.4107
0.436	1.5248	1000	0.3948
0.4279	1.9062	1250	0.3917
0.4237	2.2868	1500	0.3835
0.4168	2.6682	1750	0.3818
0.4131	3.0488	2000	0.3802
0.4084	3.4302	2250	0.3752
0.4062	3.8116	2500	0.3726
0.4033	4.1922	2750	0.3720
0.3981	4.5736	3000	0.3682
0.4002	4.9550	3250	0.3686
0.3959	5.3356	3500	0.3704
0.3958	5.7170	3750	0.3658
0.3941	6.0976	4000	0.3677
0.3928	6.4790	4250	0.3621
0.3889	6.8604	4500	0.3604
0.3891	7.2410	4750	0.3656
0.3832	7.6224	5000	0.3602
0.3868	8.0031	5250	0.3611
0.3822	8.3844	5500	0.3584
0.3823	8.7658	5750	0.3574
0.3807	9.1465	6000	0.3584
0.3774	9.5278	6250	0.3545
0.3794	9.9092	6500	0.3589
0.3762	10.2899	6750	0.3544
0.3771	10.6712	7000	0.3558
0.3753	11.0519	7250	0.3550
0.3716	11.4333	7500	0.3537
0.3735	11.8146	7750	0.3532
0.3701	12.1953	8000	0.3548
0.371	12.5767	8250	0.3549
0.3694	12.9580	8500	0.3522
0.3689	13.3387	8750	0.3546
0.3692	13.7201	9000	0.3526
0.3697	14.1007	9250	0.3528
0.3666	14.4821	9500	0.3530
0.3654	14.8635	9750	0.3529
0.3666	15.2441	10000	0.3532

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model