iteboshi-medium

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 20000

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.0877	1.1013	1000	1.2933	98.3781	54.0409
0.6047	2.2026	2000	0.8662	91.6643	34.1946
0.3394	3.3040	3000	0.7763	88.5997	34.5534
0.243	4.4053	4000	0.7650	86.5912	26.8098
0.1633	5.5066	5000	0.7654	88.1848	27.4116
0.1113	6.6079	6000	0.7906	86.4215	26.7640
0.0673	7.7093	7000	0.7989	84.6110	27.1371
0.0624	8.8106	8000	0.8190	84.6582	24.9533
0.0355	9.9119	9000	0.8439	84.0924	24.8639
0.0262	11.0132	10000	0.8546	84.7525	25.5190
0.0146	12.1145	11000	0.8571	83.5832	23.8711
0.0089	13.2159	12000	0.8546	82.8666	23.3656
0.0098	14.3172	13000	0.8761	83.8850	23.7055
0.0076	15.4185	14000	0.8775	83.1872	23.6654
0.0044	16.5198	15000	0.8781	83.0740	23.5868
0.005	17.6211	16000	0.8774	82.3102	22.7488
0.0026	18.7225	17000	0.8914	82.1499	22.5316
0.0015	19.8238	18000	0.8890	81.9896	22.3132
0.0011	20.9251	19000	0.8928	81.6219	22.2513
0.0006	22.0264	20000	0.8963	81.7162	22.2135

Safetensors

Model size

0.5B params

Tensor type

F32