loose_balanced_small_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.0163	0.9995	1502	4.4131	0.2929
3.9334	1.9993	3004	3.8896	0.3345
3.6745	2.9997	4507	3.6164	0.3586
3.368	3.9994	6009	3.4596	0.3730
3.2628	4.9992	7511	3.3636	0.3818
3.1474	5.9996	9014	3.3092	0.3868
3.0816	6.9993	10516	3.2744	0.3898
3.0327	7.9998	12019	3.2502	0.3924
2.9799	8.9995	13521	3.2281	0.3949
2.9612	9.9993	15023	3.2238	0.3955
2.9147	10.9997	16526	3.2051	0.3972
2.9138	11.9994	18028	3.2027	0.3982
2.8706	12.9992	19530	3.1993	0.3984
2.8799	13.9996	21033	3.1994	0.3986
2.8387	14.9993	22535	3.1916	0.3996
2.8571	15.9998	24038	3.1905	0.4001
2.8174	16.9995	25540	3.1923	0.4000
2.84	17.9993	27042	3.1894	0.4004
2.804	18.9997	28545	3.1876	0.4004
2.8326	19.9948	30040	3.1893	0.4003

Safetensors

Model size

0.1B params

Tensor type

F32