bcc-arbnaskh-audio-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.0763	20.0	1000	0.0587
0.0664	40.0	2000	0.0518
0.0575	60.0	3000	0.0504
0.0559	80.0	4000	0.0505
0.0492	100.0	5000	0.0524
0.0463	120.0	6000	0.0543
0.0445	140.0	7000	0.0536
0.0456	160.0	8000	0.0549
0.0413	180.0	9000	0.0554
0.0418	200.0	10000	0.0557
0.0385	220.0	11000	0.0571
0.0376	240.0	12000	0.0577
0.039	260.0	13000	0.0593
0.0369	280.0	14000	0.0581
0.0343	300.0	15000	0.0592
0.0367	320.0	16000	0.0608
0.0333	340.0	17000	0.0603
0.0324	360.0	18000	0.0613
0.0336	380.0	19000	0.0614
0.0344	400.0	20000	0.0630
0.0324	420.0	21000	0.0625
0.0317	440.0	22000	0.0639
0.0307	460.0	23000	0.0649
0.0306	480.0	24000	0.0647
0.0303	500.0	25000	0.0633
0.0331	520.0	26000	0.0664
0.0299	540.0	27000	0.0645
0.0286	560.0	28000	0.0640
0.0287	580.0	29000	0.0644
0.0281	600.0	30000	0.0658
0.0285	620.0	31000	0.0660
0.0285	640.0	32000	0.0653
0.0365	660.0	33000	0.0663
0.0278	680.0	34000	0.0654
0.0275	700.0	35000	0.0663
0.0301	720.0	36000	0.0658
0.0279	740.0	37000	0.0658
0.0297	760.0	38000	0.0657
0.0285	780.0	39000	0.0661
0.029	800.0	40000	0.0666

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model