checkpoints

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 48
eval_batch_size: 48
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 96
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 2000
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
7.4032	0.0845	500	7.3900
6.6368	0.1689	1000	6.6176
6.0293	0.2534	1500	6.0336
5.4871	0.3379	2000	5.4602
5.1774	0.4224	2500	5.1387
4.9533	0.5068	3000	4.9452
4.8279	0.5913	3500	4.8122
4.7441	0.6758	4000	4.7194
4.6783	0.7603	4500	4.6470
4.6144	0.8447	5000	4.5846
4.5477	0.9292	5500	4.5297
4.4920	1.0137	6000	4.4871
4.4523	1.0982	6500	4.4475
4.3954	1.1826	7000	4.4127
4.4032	1.2671	7500	4.3827
4.4052	1.3516	8000	4.3571
4.3566	1.4361	8500	4.3329
4.3505	1.5205	9000	4.3124
4.3208	1.6050	9500	4.2945
4.3149	1.6895	10000	4.2829
4.3015	1.7739	10500	4.2739
4.2932	1.8584	11000	4.2682
4.2789	1.9429	11500	4.2659

Safetensors

Model size

82.1M params

Tensor type

F32