train_mrpc_123_1760637681

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1302
Num Input Tokens Seen: 6774288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1283	1.0	826	0.1849	339568
0.1004	2.0	1652	0.1607	678688
0.1005	3.0	2478	0.1560	1017368
0.1635	4.0	3304	0.1467	1356744
0.1165	5.0	4130	0.1459	1694912
0.0766	6.0	4956	0.1382	2033992
0.0682	7.0	5782	0.1348	2372464
0.0961	8.0	6608	0.1357	2710624
0.1529	9.0	7434	0.1327	3049392
0.1025	10.0	8260	0.1302	3388032
0.0995	11.0	9086	0.1333	3727312
0.0818	12.0	9912	0.1305	4065504
0.1748	13.0	10738	0.1305	4404624
0.1183	14.0	11564	0.1306	4743080
0.036	15.0	12390	0.1314	5082240
0.084	16.0	13216	0.1311	5420840
0.0698	17.0	14042	0.1314	5759384
0.089	18.0	14868	0.1310	6097784
0.1345	19.0	15694	0.1310	6435544
0.1448	20.0	16520	0.1305	6774288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_123_1760637681

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model