train_mrpc_42_1760637561

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1868
Num Input Tokens Seen: 6769320

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2662	1.0	826	0.5562	337344
0.3004	2.0	1652	0.3616	675368
0.3173	3.0	2478	0.2059	1014008
0.1939	4.0	3304	1.4334	1353224
0.1431	5.0	4130	0.2725	1692320
0.1997	6.0	4956	0.1976	2030488
0.2637	7.0	5782	0.2497	2368032
0.2011	8.0	6608	0.1928	2706552
0.2025	9.0	7434	0.1903	3044984
0.1481	10.0	8260	0.2184	3384288
0.1866	11.0	9086	0.1902	3722552
0.1967	12.0	9912	0.1906	4061216
0.1712	13.0	10738	0.1979	4399064
0.1568	14.0	11564	0.1940	4737648
0.162	15.0	12390	0.1868	5075144
0.1984	16.0	13216	0.1933	5414032
0.1703	17.0	14042	0.1906	5752528
0.159	18.0	14868	0.1894	6091472
0.1737	19.0	15694	0.1899	6430144
0.1858	20.0	16520	0.1886	6769320

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_42_1760637561

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model