You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MSP-Audio

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2711
  • Wer: 0.3066
  • Cer: 0.2433

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

  • Test WER: 0.169
  • Test CER: 0.062

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000.0
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss Wer Cer
2.9020 0.6821 500 0.3734 0.3326 0.2717
2.9679 1.3643 1000 0.3505 0.3264 0.2593
2.9390 2.0464 1500 0.3923 0.3659 0.2725
2.8775 2.7285 2000 0.3607 0.3614 0.2675
2.9122 3.4106 2500 0.3953 0.3812 0.2770
2.8879 4.0928 3000 0.3950 0.3800 0.2774
2.8735 4.7749 3500 0.4303 0.3827 0.2849
2.9131 5.4570 4000 0.4071 0.3833 0.2847
2.8792 6.1392 4500 0.3638 0.3640 0.2703
2.8804 6.8213 5000 0.3389 0.3544 0.2669
2.8883 7.5034 5500 0.3495 0.3583 0.2693
2.8861 8.1855 6000 0.3985 0.3827 0.2849
2.8934 8.8677 6500 0.3453 0.3590 0.2694
2.9068 9.5498 7000 0.3327 0.3344 0.2596
2.8741 10.2319 7500 0.3176 0.3321 0.2577
2.8961 10.9141 8000 0.3362 0.3309 0.2591
2.8826 11.5962 8500 0.3344 0.3272 0.2564
2.8922 12.2783 9000 0.3172 0.3359 0.2568
2.8963 12.9604 9500 0.3175 0.3228 0.2525
2.8683 13.6426 10000 0.2987 0.3147 0.2521
2.8781 14.3247 10500 0.2992 0.3222 0.2552
2.8693 15.0068 11000 0.2764 0.3099 0.2482
2.8676 15.6889 11500 0.3020 0.3140 0.2522
2.8953 16.3711 12000 0.2932 0.3080 0.2470
2.9023 17.0532 12500 0.2895 0.3075 0.2478
2.8665 17.7353 13000 0.2889 0.3098 0.2466
2.9208 18.4175 13500 0.2753 0.3114 0.2461
2.8623 19.0996 14000 0.2749 0.3077 0.2447
2.9092 19.7817 14500 0.2711 0.3066 0.2433

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
9
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MahmoodAnaam/MSP-Audio-V0

Finetuned
(180)
this model

Dataset used to train MahmoodAnaam/MSP-Audio-V0