train_mrpc_123_1760637676

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1097
Num Input Tokens Seen: 6774288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1839	1.0	826	0.1777	339568
0.2042	2.0	1652	0.1748	678688
0.1734	3.0	2478	0.1754	1017368
0.1989	4.0	3304	0.1661	1356744
0.176	5.0	4130	0.1740	1694912
0.1192	6.0	4956	0.1501	2033992
0.1179	7.0	5782	0.1267	2372464
0.0878	8.0	6608	0.1142	2710624
0.1418	9.0	7434	0.1280	3049392
0.0718	10.0	8260	0.1097	3388032
0.0902	11.0	9086	0.1131	3727312
0.0631	12.0	9912	0.1115	4065504
0.1051	13.0	10738	0.1377	4404624
0.0531	14.0	11564	0.1769	4743080
0.0017	15.0	12390	0.1980	5082240
0.0009	16.0	13216	0.2018	5420840
0.0016	17.0	14042	0.2049	5759384
0.001	18.0	14868	0.2062	6097784
0.0023	19.0	15694	0.2055	6435544
0.0038	20.0	16520	0.2060	6774288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_123_1760637676

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model