train_rte_101112_1760638011

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.4810
Num Input Tokens Seen: 6208288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1369	2.0	996	0.1828	620096
0.165	4.0	1992	0.1808	1243872
0.141	6.0	2988	0.1576	1862496
0.133	8.0	3984	0.1837	2484128
0.1104	10.0	4980	0.2542	3105632
0.0877	12.0	5976	0.3164	3727872
0.0181	14.0	6972	0.3950	4349504
0.0004	16.0	7968	0.4406	4968192
0.0002	18.0	8964	0.4750	5589504
0.0002	20.0	9960	0.4810	6208288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_rte_101112_1760638011

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2188)

this model