wav2vec2-base-finetuned-gtzan-optimized
This model is a fine-tuned version of facebook/wav2vec2-base on the GTZAN dataset. It achieves the following results on the evaluation set:
- Loss: 1.2450
- Accuracy: 0.72
- Precision: 0.7271
- Recall: 0.72
- F1: 0.7156
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 30
- label_smoothing_factor: 0.1
Training results
| Training Loss | Epoch | Step | Accuracy | F1 | Validation Loss | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 2.3003 | 1.0 | 22 | 0.18 | 0.0834 | 2.2929 | 0.0544 | 0.18 |
| 2.2917 | 2.0 | 44 | 0.1333 | 0.0597 | 2.2837 | 0.0408 | 0.1333 |
| 2.2778 | 3.0 | 66 | 0.26 | 0.2083 | 2.2567 | 0.3654 | 0.26 |
| 2.2471 | 4.0 | 88 | 0.34 | 0.2758 | 2.2149 | 0.3997 | 0.34 |
| 2.1629 | 5.0 | 110 | 0.32 | 0.2353 | 2.1427 | 0.3069 | 0.32 |
| 2.08 | 6.0 | 132 | 0.3733 | 0.2776 | 2.0558 | 0.2645 | 0.3733 |
| 2.0188 | 7.0 | 154 | 0.3867 | 0.2997 | 1.9914 | 0.3095 | 0.3867 |
| 1.9483 | 8.0 | 176 | 0.3867 | 0.3167 | 1.9420 | 0.3785 | 0.3867 |
| 1.8804 | 9.0 | 198 | 0.4467 | 0.3905 | 1.8842 | 0.4878 | 0.4467 |
| 1.8063 | 10.0 | 220 | 0.3867 | 0.2975 | 1.8867 | 0.3360 | 0.3867 |
| 1.7808 | 11.0 | 242 | 0.4133 | 0.3619 | 1.8269 | 0.4118 | 0.4133 |
| 1.7031 | 12.0 | 264 | 0.5133 | 0.4759 | 1.7784 | 0.5104 | 0.5133 |
| 1.6752 | 13.0 | 286 | 0.4933 | 0.4502 | 1.7580 | 0.5315 | 0.4933 |
| 1.6843 | 14.0 | 308 | 0.5 | 0.4609 | 1.7113 | 0.5002 | 0.5 |
| 1.6136 | 15.0 | 330 | 0.4667 | 0.4276 | 1.7132 | 0.4710 | 0.4667 |
| 1.6392 | 1.9957 | 349 | 1.6793 | 0.4667 | 0.4630 | 0.4667 | 0.4112 |
| 1.5396 | 3.0 | 524 | 1.5783 | 0.5267 | 0.5407 | 0.5267 | 0.4945 |
| 1.5981 | 4.0 | 699 | 1.6018 | 0.5 | 0.5358 | 0.5 | 0.4795 |
| 1.3127 | 5.0 | 874 | 1.4972 | 0.56 | 0.5732 | 0.56 | 0.5382 |
| 1.5041 | 6.0 | 1049 | 1.5921 | 0.5267 | 0.5740 | 0.5267 | 0.5166 |
| 1.1165 | 7.0 | 1224 | 1.4291 | 0.5667 | 0.5364 | 0.5667 | 0.5296 |
| 1.1177 | 8.0 | 1399 | 1.3336 | 0.6267 | 0.6217 | 0.6267 | 0.5932 |
| 0.8805 | 9.0 | 1574 | 1.3987 | 0.5867 | 0.6336 | 0.5867 | 0.5745 |
| 0.8566 | 10.0 | 1749 | 1.2999 | 0.66 | 0.6753 | 0.66 | 0.6565 |
| 1.0281 | 11.0 | 1924 | 1.3834 | 0.66 | 0.6770 | 0.66 | 0.6539 |
| 0.8522 | 12.0 | 2099 | 1.3038 | 0.6933 | 0.7138 | 0.6933 | 0.6848 |
| 0.8237 | 13.0 | 2274 | 1.4544 | 0.6133 | 0.6358 | 0.6133 | 0.5935 |
| 0.7483 | 14.0 | 2449 | 1.3505 | 0.6867 | 0.7018 | 0.6867 | 0.6835 |
| 0.6935 | 15.0 | 2624 | 1.2758 | 0.68 | 0.6990 | 0.68 | 0.6805 |
| 0.6927 | 16.0 | 2799 | 1.2943 | 0.7 | 0.7034 | 0.7 | 0.6918 |
| 0.5777 | 17.0 | 2974 | 1.3557 | 0.6867 | 0.6959 | 0.6867 | 0.6773 |
| 0.5445 | 18.0 | 3149 | 1.3008 | 0.7133 | 0.7246 | 0.7133 | 0.7078 |
| 0.5349 | 19.0 | 3324 | 1.2980 | 0.6933 | 0.7111 | 0.6933 | 0.6921 |
| 0.5268 | 20.0 | 3499 | 1.2516 | 0.72 | 0.7325 | 0.72 | 0.7201 |
| 0.5458 | 21.0 | 3674 | 1.2454 | 0.7067 | 0.7028 | 0.7067 | 0.7011 |
| 0.5167 | 22.0 | 3849 | 1.2321 | 0.6933 | 0.7007 | 0.6933 | 0.6908 |
| 0.5157 | 23.0 | 4024 | 1.3093 | 0.68 | 0.6978 | 0.68 | 0.6797 |
| 0.51 | 24.0 | 4199 | 1.2763 | 0.7067 | 0.7198 | 0.7067 | 0.7044 |
| 0.5109 | 25.0 | 4374 | 1.2671 | 0.6933 | 0.7038 | 0.6933 | 0.6913 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.9.0.dev20250716+cu129
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 6
Model tree for zikangzheng/wav2vec2-base-gtzan-optimized
Base model
facebook/wav2vec2-baseDataset used to train zikangzheng/wav2vec2-base-gtzan-optimized
Evaluation results
- Accuracy on GTZANself-reported0.720
- Precision on GTZANself-reported0.727
- Recall on GTZANself-reported0.720
- F1 on GTZANself-reported0.716