balanced_small_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.0139	0.9997	1503	4.4236	0.2912
3.941	1.9994	3006	3.8961	0.3335
3.6814	2.9998	4510	3.6198	0.3578
3.3678	3.9996	6013	3.4583	0.3724
3.2601	4.9993	7516	3.3653	0.3812
3.1439	5.9998	9020	3.3011	0.3870
3.0803	6.9995	10523	3.2671	0.3907
3.0295	7.9999	12027	3.2407	0.3935
2.9783	8.9997	13530	3.2248	0.3950
2.9602	9.9994	15033	3.2122	0.3964
2.913	10.9998	16537	3.2051	0.3972
2.9128	11.9996	18040	3.2023	0.3982
2.8701	12.9993	19543	3.1954	0.3986
2.8805	13.9998	21047	3.1940	0.3996
2.8402	14.9995	22550	3.1920	0.3997
2.8567	15.9999	24054	3.1913	0.4001
2.8191	16.9997	25557	3.1891	0.4006
2.8421	17.9994	27060	3.1865	0.4004
2.8056	18.9998	28564	3.1834	0.4006
2.8322	19.9949	30060	3.1817	0.4012

Safetensors

Model size

0.1B params

Tensor type

F32