ssc-bxk-mms-model-mix-adapt-max

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 5
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
0.492	0.4872	200	0.6465	0.1689	0.6131
0.4916	0.9744	400	0.6220	0.1598	0.5790
0.4791	1.4604	600	0.6018	0.1567	0.5703
0.4661	1.9476	800	0.6045	0.1590	0.5910
0.4444	2.4336	1000	0.5895	0.1581	0.5696
0.4369	2.9208	1200	0.5877	0.1558	0.5683
0.4305	3.4068	1400	0.5883	0.1565	0.5593
0.4214	3.8940	1600	0.5816	0.1558	0.5549
0.4028	4.3800	1800	0.5771	0.1550	0.5607
0.4118	4.8672	2000	0.5747	0.1533	0.5485

Safetensors

Model size

1.0B params

Tensor type

F32