train_rte_123_1760637674

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.0777
Num Input Tokens Seen: 6958720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2235	1.0	561	0.1719	348144
0.0705	2.0	1122	0.1076	697760
0.0694	3.0	1683	0.0962	1046680
0.0938	4.0	2244	0.0912	1394776
0.159	5.0	2805	0.0850	1743216
0.0466	6.0	3366	0.0818	2088384
0.0849	7.0	3927	0.0818	2437304
0.1257	8.0	4488	0.0806	2785744
0.0364	9.0	5049	0.0790	3132040
0.0331	10.0	5610	0.0778	3481336
0.0415	11.0	6171	0.0795	3829824
0.0987	12.0	6732	0.0777	4180088
0.0507	13.0	7293	0.0788	4527216
0.075	14.0	7854	0.0794	4875496
0.0432	15.0	8415	0.0790	5222072
0.0498	16.0	8976	0.0798	5571288
0.0588	17.0	9537	0.0791	5918280
0.0649	18.0	10098	0.0801	6268760
0.0678	19.0	10659	0.0801	6614344
0.0087	20.0	11220	0.0801	6958720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_123_1760637674

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model