wav2vec2-librispeech-demo

This model is a fine-tuned version of facebook/wav2vec2-large-lv60 on the LIBRISPEECH_ASR - CLEAN dataset. It achieves the following results on the evaluation set:

Loss: 0.0030
Wer: 1.0225

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 15.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.6329	100	3.9409	1.0
No log	1.2658	200	3.0441	1.0
No log	1.8987	300	2.9165	1.0
No log	2.5316	400	1.4925	1.9968
3.7012	3.1646	500	0.3010	1.9446
3.7012	3.7975	600	0.1713	1.8259
3.7012	4.4304	700	0.0990	1.6163
3.7012	5.0633	800	0.0692	1.5439
3.7012	5.6962	900	0.0463	1.4233
0.1686	6.3291	1000	0.0389	1.3469
0.1686	6.9620	1100	0.0290	1.3101
0.1686	7.5949	1200	0.0204	1.1994
0.1686	8.2278	1300	0.0161	1.1839
0.1686	8.8608	1400	0.0143	1.1499
0.0553	9.4937	1500	0.0110	1.1460
0.0553	10.1266	1600	0.0082	1.0953
0.0553	10.7595	1700	0.0088	1.1119
0.0553	11.3924	1800	0.0059	1.0574
0.0553	12.0253	1900	0.0054	1.0510
0.0295	12.6582	2000	0.0042	1.0356
0.0295	13.2911	2100	0.0039	1.0360
0.0295	13.9241	2200	0.0033	1.0269
0.0295	14.5570	2300	0.0031	1.0237

Framework versions

Transformers 4.45.0.dev0
Pytorch 2.5.1
Datasets 2.21.0
Tokenizers 0.19.1

Downloads last month: 6

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for suhas-hegde5/wav2vec2-librispeech-demo

Base model

facebook/wav2vec2-large-lv60

Finetuned

(9)

this model

Dataset used to train suhas-hegde5/wav2vec2-librispeech-demo

Evaluation results

Wer on LIBRISPEECH_ASR - CLEAN
test set self-reported

1.023