Whisper Large V3 Turbo Burmese Finetune

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the Myanmar Speech Dataset (OpenSLR-80) dataset. It achieves the following results on the evaluation set:

Loss: 0.1727
Wer: 47.1060
Cer: 15.6324

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.8922	1.0	143	0.4413	95.9484	48.4730
0.2576	2.0	286	0.1971	83.8379	26.9627
0.1481	3.0	429	0.1505	66.4292	22.9769
0.0996	4.0	572	0.1315	62.0214	20.5786
0.0697	5.0	715	0.1344	60.8638	20.5786
0.0507	6.0	858	0.1249	57.3464	19.3075
0.038	7.0	1001	0.1273	55.2538	18.4391
0.0279	8.0	1144	0.1257	54.4524	18.4908
0.02	9.0	1287	0.1374	53.3838	17.9559
0.0147	10.0	1430	0.1422	53.3393	17.9847
0.0101	11.0	1573	0.1530	53.8736	17.9674
0.0066	12.0	1716	0.1512	50.8905	16.8344
0.0043	13.0	1859	0.1526	49.5993	16.2708
0.0026	14.0	2002	0.1594	49.9110	16.4261
0.0017	15.0	2145	0.1612	49.0205	16.2248
0.0008	16.0	2288	0.1646	48.7088	15.9027
0.0003	17.0	2431	0.1676	47.8629	15.9429
0.0001	18.0	2574	0.1707	47.5512	15.6209
0.0001	19.0	2717	0.1721	47.3731	15.6439
0.0	20.0	2860	0.1727	47.1060	15.6324

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Downloads last month: 44

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for ToeLay/whisper_large_v3_turbo_mm2

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(570)

this model

Dataset used to train ToeLay/whisper_large_v3_turbo_mm2

Evaluation results

Wer on Myanmar Speech Dataset (OpenSLR-80)
self-reported

47.106