wav2vec2-base-finetuned-gtzan-optimized

This model is a fine-tuned version of facebook/wav2vec2-base on the GTZAN dataset. It achieves the following results on the evaluation set:

Loss: 1.2450
Accuracy: 0.72
Precision: 0.7271
Recall: 0.72
F1: 0.7156

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30
label_smoothing_factor: 0.1

Training results

Training Loss	Epoch	Step	Accuracy	F1	Validation Loss	Precision	Recall
2.3003	1.0	22	0.18	0.0834	2.2929	0.0544	0.18
2.2917	2.0	44	0.1333	0.0597	2.2837	0.0408	0.1333
2.2778	3.0	66	0.26	0.2083	2.2567	0.3654	0.26
2.2471	4.0	88	0.34	0.2758	2.2149	0.3997	0.34
2.1629	5.0	110	0.32	0.2353	2.1427	0.3069	0.32
2.08	6.0	132	0.3733	0.2776	2.0558	0.2645	0.3733
2.0188	7.0	154	0.3867	0.2997	1.9914	0.3095	0.3867
1.9483	8.0	176	0.3867	0.3167	1.9420	0.3785	0.3867
1.8804	9.0	198	0.4467	0.3905	1.8842	0.4878	0.4467
1.8063	10.0	220	0.3867	0.2975	1.8867	0.3360	0.3867
1.7808	11.0	242	0.4133	0.3619	1.8269	0.4118	0.4133
1.7031	12.0	264	0.5133	0.4759	1.7784	0.5104	0.5133
1.6752	13.0	286	0.4933	0.4502	1.7580	0.5315	0.4933
1.6843	14.0	308	0.5	0.4609	1.7113	0.5002	0.5
1.6136	15.0	330	0.4667	0.4276	1.7132	0.4710	0.4667
1.6392	1.9957	349	1.6793	0.4667	0.4630	0.4667	0.4112
1.5396	3.0	524	1.5783	0.5267	0.5407	0.5267	0.4945
1.5981	4.0	699	1.6018	0.5	0.5358	0.5	0.4795
1.3127	5.0	874	1.4972	0.56	0.5732	0.56	0.5382
1.5041	6.0	1049	1.5921	0.5267	0.5740	0.5267	0.5166
1.1165	7.0	1224	1.4291	0.5667	0.5364	0.5667	0.5296
1.1177	8.0	1399	1.3336	0.6267	0.6217	0.6267	0.5932
0.8805	9.0	1574	1.3987	0.5867	0.6336	0.5867	0.5745
0.8566	10.0	1749	1.2999	0.66	0.6753	0.66	0.6565
1.0281	11.0	1924	1.3834	0.66	0.6770	0.66	0.6539
0.8522	12.0	2099	1.3038	0.6933	0.7138	0.6933	0.6848
0.8237	13.0	2274	1.4544	0.6133	0.6358	0.6133	0.5935
0.7483	14.0	2449	1.3505	0.6867	0.7018	0.6867	0.6835
0.6935	15.0	2624	1.2758	0.68	0.6990	0.68	0.6805
0.6927	16.0	2799	1.2943	0.7	0.7034	0.7	0.6918
0.5777	17.0	2974	1.3557	0.6867	0.6959	0.6867	0.6773
0.5445	18.0	3149	1.3008	0.7133	0.7246	0.7133	0.7078
0.5349	19.0	3324	1.2980	0.6933	0.7111	0.6933	0.6921
0.5268	20.0	3499	1.2516	0.72	0.7325	0.72	0.7201
0.5458	21.0	3674	1.2454	0.7067	0.7028	0.7067	0.7011
0.5167	22.0	3849	1.2321	0.6933	0.7007	0.6933	0.6908
0.5157	23.0	4024	1.3093	0.68	0.6978	0.68	0.6797
0.51	24.0	4199	1.2763	0.7067	0.7198	0.7067	0.7044
0.5109	25.0	4374	1.2671	0.6933	0.7038	0.6933	0.6913

Framework versions

Transformers 4.57.0.dev0
Pytorch 2.9.0.dev20250716+cu129
Datasets 4.0.0
Tokenizers 0.22.0

Downloads last month: 6

Safetensors

Model size

94.6M params

Tensor type

F32

Model tree for zikangzheng/wav2vec2-base-gtzan-optimized

Base model

facebook/wav2vec2-base

Finetuned

(922)

this model

Dataset used to train zikangzheng/wav2vec2-base-gtzan-optimized

Evaluation results

Accuracy on GTZAN
self-reported

0.720
Precision on GTZAN
self-reported

0.727
Recall on GTZAN
self-reported

0.720
F1 on GTZAN
self-reported

0.716