bap-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.4925	11.7647	1000	0.4308
0.443	23.5294	2000	0.4077
0.4186	35.2941	3000	0.3994
0.4148	47.0588	4000	0.3940
0.3967	58.8235	5000	0.3910
0.3953	70.5882	6000	0.3907
0.3835	82.3529	7000	0.3913
0.3715	94.1176	8000	0.3893
0.3778	105.8824	9000	0.3890
0.3706	117.6471	10000	0.3880
0.3572	129.4118	11000	0.3868
0.3658	141.1765	12000	0.3885
0.3581	152.9412	13000	0.3893
0.3564	164.7059	14000	0.3904
0.3495	176.4706	15000	0.3879
0.3535	188.2353	16000	0.3884
0.3495	200.0	17000	0.3890
0.3593	211.7647	18000	0.3907
0.345	223.5294	19000	0.3910
0.3464	235.2941	20000	0.3909
0.3363	247.0588	21000	0.3920
0.3422	258.8235	22000	0.3917
0.3383	270.5882	23000	0.3930
0.3364	282.3529	24000	0.3927
0.3334	294.1176	25000	0.3937
0.3337	305.8824	26000	0.3943
0.3251	317.6471	27000	0.3932
0.3247	329.4118	28000	0.3944
0.3294	341.1765	29000	0.3951
0.3293	352.9412	30000	0.3954
0.3455	364.7059	31000	0.3978
0.3229	376.4706	32000	0.3962
0.3201	388.2353	33000	0.3966
0.3249	400.0	34000	0.3969
0.3203	411.7647	35000	0.3965
0.3173	423.5294	36000	0.3964
0.3223	435.2941	37000	0.3968
0.3183	447.0588	38000	0.3972
0.3218	458.8235	39000	0.3967
0.3262	470.5882	40000	0.3966

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model