train_mrpc_789_1760637905

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1123
Num Input Tokens Seen: 6772448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1766	1.0	826	0.1895	338792
0.1337	2.0	1652	0.1952	677792
0.1921	3.0	2478	0.1855	1016968
0.2368	4.0	3304	0.1845	1355576
0.1671	5.0	4130	0.1697	1694240
0.22	6.0	4956	0.1718	2032216
0.1148	7.0	5782	0.1442	2370752
0.2036	8.0	6608	0.1424	2708952
0.1358	9.0	7434	0.1230	3047656
0.0751	10.0	8260	0.1168	3386008
0.1282	11.0	9086	0.1139	3724536
0.1584	12.0	9912	0.1241	4063536
0.0377	13.0	10738	0.1123	4402704
0.0462	14.0	11564	0.1237	4740960
0.0099	15.0	12390	0.1264	5078920
0.0484	16.0	13216	0.1548	5417832
0.0032	17.0	14042	0.1888	5755672
0.0076	18.0	14868	0.2001	6094000
0.0093	19.0	15694	0.2003	6433272
0.0049	20.0	16520	0.2022	6772448

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_mrpc_789_1760637905

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model