iteboshi-tiny

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 20000

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.5915	1.1013	1000	0.7334	158.2838	65.0516
0.4754	2.2026	2000	0.6800	176.7751	66.8021
0.3484	3.3040	3000	0.6674	250.1933	86.5160
0.3012	4.4053	4000	0.6733	390.6648	143.7552
0.2416	5.5066	5000	0.6857	259.8491	89.0706
0.194	6.6079	6000	0.7101	197.0769	75.6325
0.1436	7.7093	7000	0.7327	235.4833	103.3691
0.135	8.8106	8000	0.7635	223.1306	96.6303
0.0854	9.9119	9000	0.7848	235.6624	96.6693
0.062	11.0132	10000	0.8102	199.8114	83.9929
0.0299	12.1145	11000	0.8364	177.0486	102.8057
0.0254	13.2159	12000	0.8552	176.0868	85.5468
0.0196	14.3172	13000	0.8671	126.2801	60.4427
0.0136	15.4185	14000	0.8813	177.9727	73.2561
0.0102	16.5198	15000	0.8930	142.6968	57.3544
0.0079	17.6211	16000	0.9064	132.6167	59.8736
0.0074	18.7225	17000	0.9160	125.6011	55.9026
0.0053	19.8238	18000	0.9245	116.0113	50.1628
0.0052	20.9251	19000	0.9299	115.0872	47.7766
0.0043	22.0264	20000	0.9326	115.2570	50.1238

Safetensors

Model size

57.7M params

Tensor type

F32