train_rte_789_1760637900

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.1492
Num Input Tokens Seen: 6947288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1497	1.0	561	0.1816	347936
0.1519	2.0	1122	0.1549	694664
0.1534	3.0	1683	0.1539	1039864
0.0892	4.0	2244	0.0830	1384096
0.011	5.0	2805	0.0736	1732712
0.0616	6.0	3366	0.0865	2080184
0.0032	7.0	3927	0.0697	2425192
0.0041	8.0	4488	0.0758	2772384
0.014	9.0	5049	0.0706	3119968
0.0005	10.0	5610	0.0842	3466384
0.0035	11.0	6171	0.1361	3817120
0.0033	12.0	6732	0.1224	4163160
0.0008	13.0	7293	0.1245	4511312
0.0005	14.0	7854	0.1415	4861864
0.0001	15.0	8415	0.1460	5210208
0.0001	16.0	8976	0.1497	5555776
0.0001	17.0	9537	0.1530	5902048
0.0001	18.0	10098	0.1549	6252128
0.0001	19.0	10659	0.1556	6598768
0.0001	20.0	11220	0.1557	6947288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_789_1760637900

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model