train_rte_123_1760637672

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.0942
Num Input Tokens Seen: 6958720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1189	1.0	561	0.0942	348144
0.1018	2.0	1122	0.1825	697760
0.0165	3.0	1683	0.1198	1046680
0.0374	4.0	2244	0.1285	1394776
0.0404	5.0	2805	0.1305	1743216
0.0	6.0	3366	0.1759	2088384
0.0	7.0	3927	0.2039	2437304
0.0	8.0	4488	0.2241	2785744
0.0	9.0	5049	0.2369	3132040
0.0	10.0	5610	0.2463	3481336
0.0	11.0	6171	0.2559	3829824
0.0	12.0	6732	0.2646	4180088
0.0	13.0	7293	0.2705	4527216
0.0	14.0	7854	0.2753	4875496
0.0	15.0	8415	0.2785	5222072
0.0	16.0	8976	0.2846	5571288
0.0	17.0	9537	0.2847	5918280
0.0	18.0	10098	0.2854	6268760
0.0	19.0	10659	0.2858	6614344
0.0	20.0	11220	0.2883	6958720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 4

Model tree for rbelanec/train_rte_123_1760637672

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model