mui-muiNT-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0647	5.4966	1000	0.0475
0.0527	10.9931	2000	0.0442
0.0516	16.4855	3000	0.0443
0.0541	21.9821	4000	0.0433
0.0478	27.4745	5000	0.0419
0.0479	32.9710	6000	0.0423
0.0456	38.4634	7000	0.0427
0.0445	43.96	8000	0.0452
0.0428	49.4524	9000	0.0416
0.044	54.9490	10000	0.0425
0.0434	60.4414	11000	0.0425
0.0411	65.9379	12000	0.0433
0.0417	71.4303	13000	0.0428
0.0409	76.9269	14000	0.0419
0.0414	82.4193	15000	0.0438
0.0416	87.9159	16000	0.0435
0.0379	93.4083	17000	0.0427
0.0395	98.9048	18000	0.0432
0.0394	104.3972	19000	0.0431
0.038	109.8938	20000	0.0427
0.0355	115.3862	21000	0.0427
0.0374	120.8828	22000	0.0426
0.0348	126.3752	23000	0.0427
0.0348	131.8717	24000	0.0429
0.0357	137.3641	25000	0.0424
0.0356	142.8607	26000	0.0429
0.0351	148.3531	27000	0.0435
0.0341	153.8497	28000	0.0431
0.034	159.3421	29000	0.0429
0.0341	164.8386	30000	0.0429
0.0333	170.3310	31000	0.0434
0.0334	175.8276	32000	0.0431
0.034	181.32	33000	0.0433
0.0335	186.8166	34000	0.0432
0.0332	192.3090	35000	0.0430
0.0332	197.8055	36000	0.0431
0.0331	203.2979	37000	0.0428
0.0339	208.7945	38000	0.0430
0.0333	214.2869	39000	0.0433
0.0337	219.7834	40000	0.0432

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model