train_mrpc_789_1760637909

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.2308
Num Input Tokens Seen: 6772448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2701	1.0	826	0.2439	338792
0.2144	2.0	1652	0.2384	677792
0.2118	3.0	2478	0.2411	1016968
0.2605	4.0	3304	0.2353	1355576
0.2778	5.0	4130	0.2308	1694240
0.3079	6.0	4956	0.2337	2032216
0.1137	7.0	5782	0.2357	2370752
0.3252	8.0	6608	0.2345	2708952
0.1946	9.0	7434	0.2353	3047656
0.2587	10.0	8260	0.2336	3386008
0.1578	11.0	9086	0.2337	3724536
0.1253	12.0	9912	0.2335	4063536
0.154	13.0	10738	0.2315	4402704
0.241	14.0	11564	0.2336	4740960
0.2357	15.0	12390	0.2321	5078920
0.2379	16.0	13216	0.2321	5417832
0.2701	17.0	14042	0.2324	5755672
0.2085	18.0	14868	0.2329	6094000
0.183	19.0	15694	0.2320	6433272
0.1896	20.0	16520	0.2320	6772448

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_789_1760637909

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model