iteboshi

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 20000

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.0854	1.1013	1000	1.2534	97.5672	52.7088
0.5859	2.2026	2000	0.8996	90.9477	48.1097
0.3373	3.3040	3000	0.7766	87.7699	29.9950
0.2445	4.4053	4000	0.7662	86.6761	28.1264
0.1548	5.5066	5000	0.7709	86.6007	27.8748
0.1102	6.6079	6000	0.7889	86.3178	26.2934
0.0682	7.7093	7000	0.7991	84.4507	27.3578
0.0647	8.8106	8000	0.8132	84.6488	25.6262
0.0343	9.9119	9000	0.8282	84.8279	24.6948
0.0181	11.0132	10000	0.8396	83.8001	24.3618
0.0117	12.1145	11000	0.8592	84.1584	24.0030
0.0111	13.2159	12000	0.8610	83.8378	24.3537
0.0088	14.3172	13000	0.8743	84.0924	24.6323
0.0112	15.4185	14000	0.8769	84.1867	24.9344
0.0109	16.5198	15000	0.8774	84.6770	24.6214
0.0032	17.6211	16000	0.8810	82.6591	23.3174
0.0017	18.7225	17000	0.8870	82.9986	22.8532
0.0019	19.8238	18000	0.8900	82.5083	22.6634
0.0008	20.9251	19000	0.8924	82.4800	22.5878
0.0006	22.0264	20000	0.8947	82.3479	22.6268

Safetensors

Model size

0.5B params

Tensor type

BF16