train_rte_42_1760637558

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.0743
Num Input Tokens Seen: 6976960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2076	1.0	561	0.1951	352952
0.1024	2.0	1122	0.1129	701160
0.0949	3.0	1683	0.0999	1049376
0.079	4.0	2244	0.0906	1397896
0.0772	5.0	2805	0.0840	1746728
0.0678	6.0	3366	0.0817	2097448
0.0377	7.0	3927	0.0781	2447040
0.0786	8.0	4488	0.0763	2794744
0.0457	9.0	5049	0.0771	3143192
0.0445	10.0	5610	0.0752	3491160
0.1504	11.0	6171	0.0746	3843760
0.0596	12.0	6732	0.0745	4194656
0.0679	13.0	7293	0.0757	4544752
0.0218	14.0	7854	0.0745	4893272
0.0116	15.0	8415	0.0747	5242768
0.0448	16.0	8976	0.0749	5588240
0.0534	17.0	9537	0.0749	5935704
0.0258	18.0	10098	0.0743	6279912
0.0245	19.0	10659	0.0759	6627720
0.077	20.0	11220	0.0745	6976960

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_42_1760637558

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model