train_rte_456_1760637785

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.1400
Num Input Tokens Seen: 6973272

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1519	1.0	561	0.1558	351952
0.1618	2.0	1122	0.1666	702416
0.1558	3.0	1683	0.1545	1052056
0.1558	4.0	2244	0.1547	1400296
0.1489	5.0	2805	0.1559	1748504
0.161	6.0	3366	0.1582	2097920
0.1561	7.0	3927	0.1544	2447856
0.1581	8.0	4488	0.1551	2795952
0.1529	9.0	5049	0.1522	3144128
0.1642	10.0	5610	0.1531	3492600
0.1443	11.0	6171	0.1499	3839488
0.1504	12.0	6732	0.1459	4187064
0.1433	13.0	7293	0.1506	4535000
0.1635	14.0	7854	0.1673	4881752
0.137	15.0	8415	0.1400	5227704
0.1166	16.0	8976	0.1448	5576848
0.1172	17.0	9537	0.1582	5926536
0.0875	18.0	10098	0.1770	6276832
0.0698	19.0	10659	0.1979	6623720
0.0478	20.0	11220	0.1999	6973272

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_456_1760637785

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model