ikk-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5015	5.5263	1000	0.4673
0.4625	11.0499	2000	0.4486
0.461	16.5762	3000	0.4411
0.4464	22.0997	4000	0.4362
0.4339	27.6260	5000	0.4300
0.4169	33.1496	6000	0.4260
0.4406	38.6759	7000	0.4280
0.4204	44.1994	8000	0.4255
0.41	49.7258	9000	0.4222
0.4069	55.2493	10000	0.4218
0.3948	60.7756	11000	0.4251
0.3915	66.2992	12000	0.4190
0.3923	71.8255	13000	0.4221
0.4038	77.3490	14000	0.4224
0.3932	82.8753	15000	0.4181
0.3805	88.3989	16000	0.4193
0.3862	93.9252	17000	0.4188
0.3864	99.4488	18000	0.4187
0.3748	104.9751	19000	0.4190
0.3735	110.4986	20000	0.4192
0.3736	116.0222	21000	0.4174
0.3736	121.5485	22000	0.4182
0.3725	127.0720	23000	0.4187
0.3669	132.5983	24000	0.4185
0.367	138.1219	25000	0.4157
0.3694	143.6482	26000	0.4191
0.3632	149.1717	27000	0.4180
0.3607	154.6981	28000	0.4177
0.361	160.2216	29000	0.4164
0.3612	165.7479	30000	0.4168
0.3618	171.2715	31000	0.4192
0.3565	176.7978	32000	0.4175
0.362	182.3213	33000	0.4184
0.3567	187.8476	34000	0.4181
0.3545	193.3712	35000	0.4183
0.3592	198.8975	36000	0.4197
0.3524	204.4211	37000	0.4199
0.3521	209.9474	38000	0.4192
0.3625	215.4709	39000	0.4187
0.3546	220.9972	40000	0.4194

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model