train_mrpc_456_1760637793

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1253
Num Input Tokens Seen: 6773216

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2295	1.0	826	0.1388	338864
0.3464	2.0	1652	0.1412	676984
0.0576	3.0	2478	0.1253	1016176
0.0005	4.0	3304	0.2025	1354632
0.0288	5.0	4130	0.2359	1692816
0.0451	6.0	4956	0.2704	2031320
0.0001	7.0	5782	0.3526	2369768
0.0002	8.0	6608	0.2839	2708688
0.0001	9.0	7434	0.2591	3047376
0.0	10.0	8260	0.3255	3386408
0.0	11.0	9086	0.4611	3724296
0.0002	12.0	9912	0.4710	4063352
0.0	13.0	10738	0.3664	4402032
0.0	14.0	11564	0.4681	4740464
0.0	15.0	12390	0.4486	5079384
0.0	16.0	13216	0.4662	5418192
0.0	17.0	14042	0.4810	5757208
0.0	18.0	14868	0.4891	6095648
0.0	19.0	15694	0.4956	6434448
0.0	20.0	16520	0.4956	6773216

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_456_1760637793

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model