train_rte_123_1760637671

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Loss: 0.1522
Num Input Tokens Seen: 6958720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1593	1.0	561	0.1562	348144
0.1565	2.0	1122	0.1598	697760
0.1593	3.0	1683	0.1562	1046680
0.1409	4.0	2244	0.1596	1394776
0.1477	5.0	2805	0.1445	1743216
0.1505	6.0	3366	0.1434	2088384
0.1281	7.0	3927	0.1455	2437304
0.1394	8.0	4488	0.1421	2785744
0.1308	9.0	5049	0.1358	3132040
0.1164	10.0	5610	0.1426	3481336
0.1162	11.0	6171	0.1330	3829824
0.1252	12.0	6732	0.1292	4180088
0.1081	13.0	7293	0.1301	4527216
0.1168	14.0	7854	0.1281	4875496
0.1137	15.0	8415	0.1396	5222072
0.1027	16.0	8976	0.1512	5571288
0.0859	17.0	9537	0.1642	5918280
0.0599	18.0	10098	0.1841	6268760
0.0898	19.0	10659	0.1896	6614344
0.0729	20.0	11220	0.1906	6958720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_rte_123_1760637671

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model