You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MSP-Audio

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2711
Wer: 0.3066
Cer: 0.2433

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

Test WER: 0.169
Test CER: 0.062

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000.0
num_epochs: 20.0

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
2.9020	0.6821	500	0.3734	0.3326	0.2717
2.9679	1.3643	1000	0.3505	0.3264	0.2593
2.9390	2.0464	1500	0.3923	0.3659	0.2725
2.8775	2.7285	2000	0.3607	0.3614	0.2675
2.9122	3.4106	2500	0.3953	0.3812	0.2770
2.8879	4.0928	3000	0.3950	0.3800	0.2774
2.8735	4.7749	3500	0.4303	0.3827	0.2849
2.9131	5.4570	4000	0.4071	0.3833	0.2847
2.8792	6.1392	4500	0.3638	0.3640	0.2703
2.8804	6.8213	5000	0.3389	0.3544	0.2669
2.8883	7.5034	5500	0.3495	0.3583	0.2693
2.8861	8.1855	6000	0.3985	0.3827	0.2849
2.8934	8.8677	6500	0.3453	0.3590	0.2694
2.9068	9.5498	7000	0.3327	0.3344	0.2596
2.8741	10.2319	7500	0.3176	0.3321	0.2577
2.8961	10.9141	8000	0.3362	0.3309	0.2591
2.8826	11.5962	8500	0.3344	0.3272	0.2564
2.8922	12.2783	9000	0.3172	0.3359	0.2568
2.8963	12.9604	9500	0.3175	0.3228	0.2525
2.8683	13.6426	10000	0.2987	0.3147	0.2521
2.8781	14.3247	10500	0.2992	0.3222	0.2552
2.8693	15.0068	11000	0.2764	0.3099	0.2482
2.8676	15.6889	11500	0.3020	0.3140	0.2522
2.8953	16.3711	12000	0.2932	0.3080	0.2470
2.9023	17.0532	12500	0.2895	0.3075	0.2478
2.8665	17.7353	13000	0.2889	0.3098	0.2466
2.9208	18.4175	13500	0.2753	0.3114	0.2461
2.8623	19.0996	14000	0.2749	0.3077	0.2447
2.9092	19.7817	14500	0.2711	0.3066	0.2433

Framework versions

Transformers 5.0.0
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 9

Safetensors

Model size

94.4M params

Tensor type

F32

Model tree for MahmoodAnaam/MSP-Audio-V0

Base model

facebook/wav2vec2-base-960h

Finetuned

(180)

this model

MahmoodAnaam
/

MSP-Audio-V0

You need to agree to share your contact information to access this model

MSP-Audio

Evaluation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for MahmoodAnaam/MSP-Audio-V0

Dataset used to train MahmoodAnaam/MSP-Audio-V0