You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MSP-Multimodal

This model is a fine-tuned version of MahmoodAnaam/MSP-Fusion on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2646
  • Wer: 0.5083
  • Cer: 0.3704

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

  • Test WER Audio Only: 0.300
  • Test CER Audio Only: 0.132
  • Test WER Visual Only: 0.425
  • Test CER Visual Only: 0.229
  • Test WER Audio Visual: 0.427
  • Test CER Audio Visual: 0.223

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000.0
  • num_epochs: 30.0

Training results

Training Loss Epoch Step Validation Loss Wer Cer
2.4059 0.6821 500 1.6052 0.5069 0.3500
2.4333 1.3643 1000 1.7269 0.5535 0.3694
3.0845 2.0464 1500 1.7036 0.5348 0.3601
3.3634 2.7285 2000 1.6688 0.5338 0.3602
3.0551 3.4106 2500 1.8447 0.5489 0.3737
3.3026 4.0928 3000 1.9458 0.55 0.3841
2.9599 4.7749 3500 2.0907 0.5434 0.3790
2.6671 5.4570 4000 2.0219 0.5239 0.3664
2.6144 6.1392 4500 2.0127 0.5601 0.3882
2.6796 6.8213 5000 1.9367 0.5347 0.3735
2.6720 7.5034 5500 2.0124 0.5363 0.3834
3.2063 8.1855 6000 2.2747 0.5479 0.3925
2.9087 8.8677 6500 1.9990 0.5345 0.3737
2.9626 9.5498 7000 2.1966 0.5222 0.3767
2.6168 10.2319 7500 2.1805 0.5272 0.3780
3.0100 10.9141 8000 1.8695 0.5225 0.3634
2.8280 11.5962 8500 1.9040 0.5224 0.3690
3.5308 12.2783 9000 2.1692 0.5225 0.3780
2.9471 12.9604 9500 2.0586 0.5252 0.3741
2.7580 13.6426 10000 2.1847 0.5332 0.3779
2.7175 14.3247 10500 2.1238 0.5267 0.3742
2.1010 15.0068 11000 2.0454 0.5203 0.3711
3.1069 15.6889 11500 2.2207 0.5344 0.3809
2.9546 16.3711 12000 2.1677 0.5255 0.3823
3.1365 17.0532 12500 2.2885 0.5210 0.3782
3.4372 17.7353 13000 2.4734 0.5215 0.3820
2.3137 18.4175 13500 2.0898 0.5194 0.3744
1.7379 19.0996 14000 2.2457 0.5300 0.3808
2.5903 19.7817 14500 2.2364 0.5225 0.3738
2.7463 20.4638 15000 2.3715 0.5174 0.3778
3.1977 21.1460 15500 2.2259 0.5177 0.3713
2.6823 21.8281 16000 2.0992 0.5135 0.3686
2.8125 22.5102 16500 2.1651 0.5144 0.3707
1.7893 23.1924 17000 2.2797 0.5138 0.3727
2.9536 23.8745 17500 2.2161 0.5161 0.3716
2.3546 24.5566 18000 2.1885 0.5122 0.3708
2.1879 25.2387 18500 2.1976 0.5116 0.3711
2.4205 25.9209 19000 2.2363 0.5138 0.3725
2.4324 26.6030 19500 2.2674 0.5143 0.3729
2.5400 27.2851 20000 2.2581 0.5173 0.3725
2.1698 27.9673 20500 2.2875 0.5125 0.3734
2.6201 28.6494 21000 2.3026 0.5093 0.3711
2.6334 29.3315 21500 2.2760 0.5116 0.3717

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
11
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MahmoodAnaam/MSP-Multimodal-V0

Finetuned
(1)
this model

Dataset used to train MahmoodAnaam/MSP-Multimodal-V0