wav2vec2-base-finetuned-gtzan-optimized

This model is a fine-tuned version of facebook/wav2vec2-base on the GTZAN dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2450
  • Accuracy: 0.72
  • Precision: 0.7271
  • Recall: 0.72
  • F1: 0.7156

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Accuracy F1 Validation Loss Precision Recall
2.3003 1.0 22 0.18 0.0834 2.2929 0.0544 0.18
2.2917 2.0 44 0.1333 0.0597 2.2837 0.0408 0.1333
2.2778 3.0 66 0.26 0.2083 2.2567 0.3654 0.26
2.2471 4.0 88 0.34 0.2758 2.2149 0.3997 0.34
2.1629 5.0 110 0.32 0.2353 2.1427 0.3069 0.32
2.08 6.0 132 0.3733 0.2776 2.0558 0.2645 0.3733
2.0188 7.0 154 0.3867 0.2997 1.9914 0.3095 0.3867
1.9483 8.0 176 0.3867 0.3167 1.9420 0.3785 0.3867
1.8804 9.0 198 0.4467 0.3905 1.8842 0.4878 0.4467
1.8063 10.0 220 0.3867 0.2975 1.8867 0.3360 0.3867
1.7808 11.0 242 0.4133 0.3619 1.8269 0.4118 0.4133
1.7031 12.0 264 0.5133 0.4759 1.7784 0.5104 0.5133
1.6752 13.0 286 0.4933 0.4502 1.7580 0.5315 0.4933
1.6843 14.0 308 0.5 0.4609 1.7113 0.5002 0.5
1.6136 15.0 330 0.4667 0.4276 1.7132 0.4710 0.4667
1.6392 1.9957 349 1.6793 0.4667 0.4630 0.4667 0.4112
1.5396 3.0 524 1.5783 0.5267 0.5407 0.5267 0.4945
1.5981 4.0 699 1.6018 0.5 0.5358 0.5 0.4795
1.3127 5.0 874 1.4972 0.56 0.5732 0.56 0.5382
1.5041 6.0 1049 1.5921 0.5267 0.5740 0.5267 0.5166
1.1165 7.0 1224 1.4291 0.5667 0.5364 0.5667 0.5296
1.1177 8.0 1399 1.3336 0.6267 0.6217 0.6267 0.5932
0.8805 9.0 1574 1.3987 0.5867 0.6336 0.5867 0.5745
0.8566 10.0 1749 1.2999 0.66 0.6753 0.66 0.6565
1.0281 11.0 1924 1.3834 0.66 0.6770 0.66 0.6539
0.8522 12.0 2099 1.3038 0.6933 0.7138 0.6933 0.6848
0.8237 13.0 2274 1.4544 0.6133 0.6358 0.6133 0.5935
0.7483 14.0 2449 1.3505 0.6867 0.7018 0.6867 0.6835
0.6935 15.0 2624 1.2758 0.68 0.6990 0.68 0.6805
0.6927 16.0 2799 1.2943 0.7 0.7034 0.7 0.6918
0.5777 17.0 2974 1.3557 0.6867 0.6959 0.6867 0.6773
0.5445 18.0 3149 1.3008 0.7133 0.7246 0.7133 0.7078
0.5349 19.0 3324 1.2980 0.6933 0.7111 0.6933 0.6921
0.5268 20.0 3499 1.2516 0.72 0.7325 0.72 0.7201
0.5458 21.0 3674 1.2454 0.7067 0.7028 0.7067 0.7011
0.5167 22.0 3849 1.2321 0.6933 0.7007 0.6933 0.6908
0.5157 23.0 4024 1.3093 0.68 0.6978 0.68 0.6797
0.51 24.0 4199 1.2763 0.7067 0.7198 0.7067 0.7044
0.5109 25.0 4374 1.2671 0.6933 0.7038 0.6933 0.6913

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.9.0.dev20250716+cu129
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
6
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zikangzheng/wav2vec2-base-gtzan-optimized

Finetuned
(922)
this model

Dataset used to train zikangzheng/wav2vec2-base-gtzan-optimized

Evaluation results