train_mrpc_789_1760637906

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.2521
Num Input Tokens Seen: 6772448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1736	1.0	826	0.1572	338792
0.1717	2.0	1652	0.1210	677792
0.0353	3.0	2478	0.1017	1016968
0.1653	4.0	3304	0.1688	1355576
0.0729	5.0	4130	0.1101	1694240
0.1002	6.0	4956	0.1083	2032216
0.045	7.0	5782	0.1202	2370752
0.0674	8.0	6608	0.1356	2708952
0.0405	9.0	7434	0.1237	3047656
0.0023	10.0	8260	0.1492	3386008
0.0371	11.0	9086	0.1516	3724536
0.0029	12.0	9912	0.2205	4063536
0.0005	13.0	10738	0.3173	4402704
0.1699	14.0	11564	0.3261	4740960
0.0002	15.0	12390	0.3514	5078920
0.0007	16.0	13216	0.3548	5417832
0.0003	17.0	14042	0.3627	5755672
0.0002	18.0	14868	0.3691	6094000
0.0002	19.0	15694	0.3718	6433272
0.0002	20.0	16520	0.3760	6772448

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_789_1760637906

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model