long_first_noditransitive_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.1891	0.9998	1495	4.6436	0.2736
4.6034	1.9997	2990	4.1494	0.3161
3.9716	2.9995	4485	3.8860	0.3389
3.7944	4.0	5981	3.7209	0.3545
3.5542	4.9998	7476	3.6183	0.3654
3.4776	5.9997	8971	3.5526	0.3718
3.3553	6.9995	10466	3.5108	0.3767
3.3191	8.0	11962	3.4802	0.3802
3.2459	8.9998	13457	3.4601	0.3823
3.2256	9.9997	14952	3.4486	0.3841
3.1776	10.9995	16447	3.4337	0.3858
3.1638	12.0	17943	3.4274	0.3868
3.1319	12.9998	19438	3.4218	0.3876
3.1231	13.9997	20933	3.4145	0.3886
3.0985	14.9995	22428	3.4103	0.3891
3.0942	16.0	23924	3.4085	0.3897
3.0768	16.9998	25419	3.4036	0.3901
3.0733	17.9997	26914	3.4018	0.3906
3.0629	18.9995	28409	3.4010	0.3906
3.0566	19.9967	29900	3.4021	0.3908

Safetensors

Model size

0.1B params

Tensor type

F32