Whisper medium zh - Song train

This model is a fine-tuned version of openai/whisper-medium on the Chinese songs * 14 dataset. It achieves the following results on the evaluation set:

Loss: 0.7806
Wer: 40.2321

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1666	3.4724	1000	0.6639	54.1586
0.0396	6.9449	2000	0.7002	46.2282
0.0008	10.4168	3000	0.7561	40.2321
0.0001	13.8893	4000	0.7746	40.4255
0.0	17.3613	5000	0.7806	40.2321

Framework versions

Transformers 4.56.2
Pytorch 2.7.1+cu118
Datasets 4.1.1
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for Zzzkay1/whisper-medium-zh

Base model

openai/whisper-medium

Finetuned

(832)

this model

Evaluation results

Wer on Chinese songs * 14
self-reported

40.232