w2v2-lmk_augmented

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the audiofolder dataset. It achieves the following results on the evaluation set:

Loss: 1.2406
Wer: 0.4878
Cer: 0.1858

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
8.5498	2.7123	100	4.0528	1.0	1.0
3.1716	5.4110	200	2.9634	1.0	1.0
2.9756	8.1096	300	2.8924	1.0	1.0
2.8279	10.8219	400	2.5968	1.0	1.0
2.2866	13.5205	500	1.7827	0.9895	0.6283
1.619	16.2192	600	1.3242	0.9443	0.4021
1.2926	18.9315	700	1.1299	0.7875	0.2833
1.0181	21.6301	800	1.1390	0.6585	0.2513
0.8774	24.3288	900	1.0760	0.6132	0.2338
0.7471	27.0274	1000	0.9959	0.5889	0.2155
0.6542	29.7397	1100	1.0575	0.5575	0.2117
0.5632	32.4384	1200	1.0240	0.5784	0.2171
0.4834	35.1370	1300	1.0971	0.5505	0.1912
0.4716	37.8493	1400	1.1336	0.5749	0.2056
0.45	40.5479	1500	1.0703	0.5679	0.2079
0.394	43.2466	1600	1.1579	0.5645	0.2178
0.3588	45.9589	1700	1.0555	0.5296	0.1896
0.3217	48.6575	1800	1.2323	0.5575	0.2102
0.3245	51.3562	1900	1.1639	0.5401	0.2018
0.289	54.0548	2000	1.1304	0.5122	0.1927
0.28	56.7671	2100	1.2295	0.5296	0.2003
0.2521	59.4658	2200	1.1612	0.5226	0.1950
0.2624	62.1644	2300	1.1982	0.5157	0.2003
0.2402	64.8767	2400	1.2075	0.5296	0.1988
0.2258	67.5753	2500	1.2091	0.5366	0.2003
0.2232	70.2740	2600	1.1830	0.5296	0.1957
0.2181	72.9863	2700	1.2001	0.5157	0.1942
0.2214	75.6849	2800	1.1942	0.5052	0.1889
0.1752	78.3836	2900	1.1873	0.5087	0.1896
0.1891	81.0822	3000	1.2159	0.5192	0.1927
0.1733	83.7945	3100	1.2105	0.5017	0.1881
0.1982	86.4932	3200	1.2331	0.5087	0.1874
0.1681	89.1918	3300	1.1848	0.4808	0.1790
0.1631	91.9041	3400	1.2273	0.4878	0.1858
0.1579	94.6027	3500	1.2334	0.4948	0.1843
0.1795	97.3014	3600	1.2399	0.4878	0.1851
0.1592	100.0	3700	1.2406	0.4878	0.1858

Framework versions

Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 3.0.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for aconeil/w2v2-lmk_augmented

Base model

facebook/wav2vec2-large-xlsr-53

Finetuned

(369)

this model

Evaluation results

Wer on audiofolder
test set self-reported

0.488