long_first_small_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.1883	0.9994	1506	4.6503	0.2711
4.227	1.9997	3013	4.1522	0.3157
3.9861	2.9993	4519	3.8888	0.3386
3.6917	3.9995	6026	3.7272	0.3533
3.5808	4.9998	7533	3.6234	0.3642
3.4538	5.9993	9039	3.5609	0.3708
3.3831	6.9996	10546	3.5157	0.3753
3.3248	7.9998	12053	3.4855	0.3788
3.2703	8.9994	13559	3.4655	0.3813
3.2465	9.9997	15066	3.4494	0.3837
3.1997	10.9993	16572	3.4412	0.3847
3.1928	11.9995	18079	3.4307	0.3860
3.1513	12.9998	19586	3.4247	0.3869
3.156	13.9993	21092	3.4191	0.3878
3.1176	14.9996	22599	3.4111	0.3885
3.1316	15.9998	24106	3.4094	0.3891
3.0936	16.9994	25612	3.4082	0.3893
3.1122	17.9997	27119	3.4105	0.3894
3.0781	18.9993	28625	3.4072	0.3897
3.1019	19.9915	30120	3.4061	0.3899

Safetensors

Model size

0.1B params

Tensor type

F32