train_mrpc_789_1760637907

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1058
Num Input Tokens Seen: 6772448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2472	1.0	826	0.1276	338792
0.1582	2.0	1652	0.1494	677792
0.0767	3.0	2478	0.1058	1016968
0.0471	4.0	3304	0.1575	1355576
0.0004	5.0	4130	0.1653	1694240
0.0004	6.0	4956	0.1864	2032216
0.0003	7.0	5782	0.2497	2370752
0.0001	8.0	6608	0.2246	2708952
0.0	9.0	7434	0.3029	3047656
0.0002	10.0	8260	0.2518	3386008
0.0	11.0	9086	0.2672	3724536
0.0981	12.0	9912	0.3259	4063536
0.0	13.0	10738	0.3472	4402704
0.0	14.0	11564	0.4299	4740960
0.0	15.0	12390	0.4163	5078920
0.0	16.0	13216	0.4223	5417832
0.0	17.0	14042	0.4581	5755672
0.0	18.0	14868	0.4689	6094000
0.0	19.0	15694	0.4748	6433272
0.0	20.0	16520	0.4785	6772448

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_789_1760637907

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model