train_rte_456_1760637787

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.1014
Num Input Tokens Seen: 6973272

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0603	1.0	561	0.1089	351952
0.0306	2.0	1122	0.1014	702416
0.0523	3.0	1683	0.1210	1052056
0.0947	4.0	2244	0.1555	1400296
0.0001	5.0	2805	0.1957	1748504
0.0002	6.0	3366	0.1794	2097920
0.0	7.0	3927	0.2560	2447856
0.0	8.0	4488	0.3359	2795952
0.0	9.0	5049	0.2325	3144128
0.0	10.0	5610	0.2242	3492600
0.0	11.0	6171	0.2440	3839488
0.0	12.0	6732	0.2605	4187064
0.0	13.0	7293	0.2733	4535000
0.0	14.0	7854	0.2805	4881752
0.0	15.0	8415	0.2906	5227704
0.0	16.0	8976	0.2953	5576848
0.0	17.0	9537	0.2969	5926536
0.0	18.0	10098	0.3010	6276832
0.0	19.0	10659	0.3023	6623720
0.0	20.0	11220	0.3020	6973272

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_456_1760637787

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model