iteboshi-small

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 20000

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.1451	1.1013	1000	1.3127	98.0009	52.9954
0.6729	2.2026	2000	0.9211	94.9364	36.0076
0.4087	3.3040	3000	0.8170	91.4757	33.0592
0.303	4.4053	4000	0.7996	95.9642	35.4912
0.211	5.5066	5000	0.7910	90.7685	39.8708
0.1389	6.6079	6000	0.8133	91.2588	46.3311
0.0864	7.7093	7000	0.8312	92.6638	39.9178
0.0729	8.8106	8000	0.8530	91.6172	50.7434
0.0381	9.9119	9000	0.8698	91.5700	47.7159
0.028	11.0132	10000	0.8864	92.1452	54.3756
0.0142	12.1145	11000	0.8988	93.2107	53.6414
0.0131	13.2159	12000	0.9192	92.8053	46.2153
0.0088	14.3172	13000	0.9230	93.8142	54.3103
0.0092	15.4185	14000	0.9310	94.0311	53.1192
0.0069	16.5198	15000	0.9370	93.8802	51.9775
0.0023	17.6211	16000	0.9437	94.2386	49.6876
0.0026	18.7225	17000	0.9495	94.0971	51.5929
0.0016	19.8238	18000	0.9531	94.1631	51.0942
0.0015	20.9251	19000	0.9608	94.7006	50.9125
0.0012	22.0264	20000	0.9635	94.6252	50.3003

Safetensors

Model size

0.2B params

Tensor type

F32