You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MSP-Multimodal

This model is a fine-tuned version of MahmoodAnaam/MSP-Fusion on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.2646
Wer: 0.5083
Cer: 0.3704

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

Test WER Audio Only: 0.300
Test CER Audio Only: 0.132
Test WER Visual Only: 0.425
Test CER Visual Only: 0.229
Test WER Audio Visual: 0.427
Test CER Audio Visual: 0.223

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000.0
num_epochs: 30.0

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
2.4059	0.6821	500	1.6052	0.5069	0.3500
2.4333	1.3643	1000	1.7269	0.5535	0.3694
3.0845	2.0464	1500	1.7036	0.5348	0.3601
3.3634	2.7285	2000	1.6688	0.5338	0.3602
3.0551	3.4106	2500	1.8447	0.5489	0.3737
3.3026	4.0928	3000	1.9458	0.55	0.3841
2.9599	4.7749	3500	2.0907	0.5434	0.3790
2.6671	5.4570	4000	2.0219	0.5239	0.3664
2.6144	6.1392	4500	2.0127	0.5601	0.3882
2.6796	6.8213	5000	1.9367	0.5347	0.3735
2.6720	7.5034	5500	2.0124	0.5363	0.3834
3.2063	8.1855	6000	2.2747	0.5479	0.3925
2.9087	8.8677	6500	1.9990	0.5345	0.3737
2.9626	9.5498	7000	2.1966	0.5222	0.3767
2.6168	10.2319	7500	2.1805	0.5272	0.3780
3.0100	10.9141	8000	1.8695	0.5225	0.3634
2.8280	11.5962	8500	1.9040	0.5224	0.3690
3.5308	12.2783	9000	2.1692	0.5225	0.3780
2.9471	12.9604	9500	2.0586	0.5252	0.3741
2.7580	13.6426	10000	2.1847	0.5332	0.3779
2.7175	14.3247	10500	2.1238	0.5267	0.3742
2.1010	15.0068	11000	2.0454	0.5203	0.3711
3.1069	15.6889	11500	2.2207	0.5344	0.3809
2.9546	16.3711	12000	2.1677	0.5255	0.3823
3.1365	17.0532	12500	2.2885	0.5210	0.3782
3.4372	17.7353	13000	2.4734	0.5215	0.3820
2.3137	18.4175	13500	2.0898	0.5194	0.3744
1.7379	19.0996	14000	2.2457	0.5300	0.3808
2.5903	19.7817	14500	2.2364	0.5225	0.3738
2.7463	20.4638	15000	2.3715	0.5174	0.3778
3.1977	21.1460	15500	2.2259	0.5177	0.3713
2.6823	21.8281	16000	2.0992	0.5135	0.3686
2.8125	22.5102	16500	2.1651	0.5144	0.3707
1.7893	23.1924	17000	2.2797	0.5138	0.3727
2.9536	23.8745	17500	2.2161	0.5161	0.3716
2.3546	24.5566	18000	2.1885	0.5122	0.3708
2.1879	25.2387	18500	2.1976	0.5116	0.3711
2.4205	25.9209	19000	2.2363	0.5138	0.3725
2.4324	26.6030	19500	2.2674	0.5143	0.3729
2.5400	27.2851	20000	2.2581	0.5173	0.3725
2.1698	27.9673	20500	2.2875	0.5125	0.3734
2.6201	28.6494	21000	2.3026	0.5093	0.3711
2.6334	29.3315	21500	2.2760	0.5116	0.3717

Framework versions

Transformers 5.0.0
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 11

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for MahmoodAnaam/MSP-Multimodal-V0

Base model

MahmoodAnaam/MSP-Fusion-V0

Finetuned

(1)

this model

MahmoodAnaam
/

MSP-Multimodal-V0

You need to agree to share your contact information to access this model

MSP-Multimodal

Evaluation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for MahmoodAnaam/MSP-Multimodal-V0

Dataset used to train MahmoodAnaam/MSP-Multimodal-V0