train_mrpc_42_1767887007

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1703
Num Input Tokens Seen: 3176720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1063	0.5003	826	0.1869	159184
0.2039	1.0006	1652	0.2682	317472
0.2737	1.5009	2478	0.2038	476064
0.31	2.0012	3304	0.1862	635320
0.0298	2.5015	4130	0.1837	794168
1.0274	3.0018	4956	0.2195	953048
0.315	3.5021	5782	0.1703	1110936
0.1443	4.0024	6608	0.2054	1270600
0.1766	4.5027	7434	0.1833	1428808
0.0008	5.0030	8260	0.2010	1588656
0.237	5.5033	9086	0.2213	1748592
0.0343	6.0036	9912	0.2101	1906808
0.0314	6.5039	10738	0.2135	2065864
0.0067	7.0042	11564	0.2120	2224048
0.211	7.5045	12390	0.2255	2382896
0.1316	8.0048	13216	0.2202	2542008
0.1317	8.5051	14042	0.2261	2701288
0.4379	9.0055	14868	0.2267	2860216
0.0027	9.5058	15694	0.2267	3020392

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_42_1767887007

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model