llama2-1m-pg19

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss
5.5529	0.0412	500	5.5682
4.4825	0.0825	1000	4.5591
4.2184	0.1237	1500	4.3646
4.2034	0.1650	2000	4.2814
4.0955	0.2062	2500	4.2316
4.1191	0.2475	3000	4.2016
4.0088	0.2887	3500	4.1736
4.0641	0.3299	4000	4.1580
4.1002	0.3712	4500	4.1433
4.0197	0.4124	5000	4.1292
3.9741	0.4537	5500	4.1164
3.9915	0.4949	6000	4.1134
4.01	0.5361	6500	4.1027
3.9424	0.5774	7000	4.0973
4.0078	0.6186	7500	4.0894
4.0254	0.6599	8000	4.0856
3.9711	0.7011	8500	4.0815
3.9905	0.7424	9000	4.0774
3.9657	0.7836	9500	4.0743
3.9494	0.8248	10000	4.0699
4.0339	0.8661	10500	4.0691
3.9739	0.9073	11000	4.0659
3.9678	0.9486	11500	4.0643
3.9043	0.9898	12000	4.0628

Safetensors

Model size

2.11M params

Tensor type

F32