train_rte_42_1760637556

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.0737
Num Input Tokens Seen: 6976960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1256	1.0	561	0.0890	352952
0.0429	2.0	1122	0.0740	701160
0.1301	3.0	1683	0.0737	1049376
0.0	4.0	2244	0.1046	1397896
0.0	5.0	2805	0.1298	1746728
0.0002	6.0	3366	0.1091	2097448
0.0	7.0	3927	0.1474	2447040
0.0	8.0	4488	0.1595	2794744
0.0	9.0	5049	0.1650	3143192
0.0	10.0	5610	0.1712	3491160
0.0	11.0	6171	0.1764	3843760
0.0	12.0	6732	0.1803	4194656
0.0	13.0	7293	0.1840	4544752
0.0	14.0	7854	0.1865	4893272
0.0	15.0	8415	0.1875	5242768
0.0	16.0	8976	0.1919	5588240
0.0	17.0	9537	0.1917	5935704
0.0	18.0	10098	0.1926	6279912
0.0	19.0	10659	0.1935	6627720
0.0	20.0	11220	0.1933	6976960

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_42_1760637556

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model