Whisper smail zh - Song train

This model is a fine-tuned version of openai/whisper-small on the Chinese songs * 58 dataset. It achieves the following results on the evaluation set:

Loss: 2.3571
Wer: 23.1198

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
training_steps: 10000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.7182	0.9456	100	2.4663	74.5125
2.1609	1.8889	200	2.1933	29.2015
1.8165	2.8322	300	2.1198	25.9517
1.6089	3.7754	400	2.1005	24.7447
1.5458	4.7187	500	2.1080	24.4661
1.5067	5.6619	600	2.0911	24.4197
1.4875	6.6052	700	2.1234	23.8626
1.4783	7.5485	800	2.1006	26.7409
1.4637	8.4917	900	2.1462	23.7233
1.4595	9.4350	1000	2.1491	24.5590
1.4526	10.3783	1100	2.1464	24.2804
1.4499	11.3215	1200	2.1496	23.2591
1.4424	12.2648	1300	2.1723	25.4875
1.4432	13.2080	1400	2.1740	24.4197
1.4411	14.1513	1500	2.1619	23.3519
1.4364	15.0946	1600	2.1972	43.8254
1.4366	16.0378	1700	2.1931	22.9805
1.4353	16.9835	1800	2.2018	23.3983
1.4319	17.9267	1900	2.2067	23.3519
1.431	18.8700	2000	2.2079	22.5162
1.4303	19.8132	2100	2.2221	22.6555
1.4277	20.7565	2200	2.2354	22.5162
1.4266	21.6998	2300	2.2289	22.8877
1.4261	22.6430	2400	2.2336	22.7484
1.4251	23.5863	2500	2.2423	23.3519
1.4238	24.5296	2600	2.2548	23.0269
1.4224	25.4728	2700	2.2598	23.0269
1.4225	26.4161	2800	2.2682	22.6555
1.4214	27.3593	2900	2.2712	22.4234
1.4211	28.3026	3000	2.2869	22.6555
1.4207	29.2459	3100	2.2880	22.4234
1.4197	30.1891	3200	2.2848	22.4234
1.4207	31.1324	3300	2.3099	22.1913
1.4193	32.0757	3400	2.3111	22.2377
1.4192	33.0189	3500	2.3284	22.4698
1.4187	33.9645	3600	2.3349	22.8877
1.4187	34.9078	3700	2.3347	22.8877
1.4181	35.8511	3800	2.3441	22.6091
1.4188	36.7943	3900	2.3338	22.6091
1.4182	37.7376	4000	2.3462	22.2377
1.4183	38.6809	4100	2.3396	22.4698
1.4183	39.6241	4200	2.3441	22.3770
1.4179	40.5674	4300	2.3571	23.1198

Framework versions

Transformers 4.56.2
Pytorch 2.7.1+cu118
Datasets 4.1.1
Tokenizers 0.22.1

Downloads last month: 52

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Zzzkay1/whisper-small-zh

Base model

openai/whisper-small

Finetuned

(3407)

this model

Evaluation results

Wer on Chinese songs * 58
self-reported

23.120