train_mrpc_456_1760637791

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1311
Num Input Tokens Seen: 6773216

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2465	1.0	826	0.2048	338864
0.1927	2.0	1652	0.1857	676984
0.1895	3.0	2478	0.1846	1016176
0.25	4.0	3304	0.1879	1354632
0.1154	5.0	4130	0.1810	1692816
0.1836	6.0	4956	0.1711	2031320
0.1226	7.0	5782	0.1825	2369768
0.1663	8.0	6608	0.1495	2708688
0.0502	9.0	7434	0.1644	3047376
0.1111	10.0	8260	0.1470	3386408
0.0932	11.0	9086	0.1311	3724296
0.1617	12.0	9912	0.1385	4063352
0.0904	13.0	10738	0.1451	4402032
0.0115	14.0	11564	0.1562	4740464
0.0464	15.0	12390	0.1838	5079384
0.0072	16.0	13216	0.2151	5418192
0.0094	17.0	14042	0.2423	5757208
0.01	18.0	14868	0.2462	6095648
0.0102	19.0	15694	0.2460	6434448
0.008	20.0	16520	0.2470	6773216

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_456_1760637791

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model