train_mrpc_123_1760637678

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.1104
Num Input Tokens Seen: 6774288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1032	1.0	826	0.1915	339568
0.0404	2.0	1652	0.1104	678688
0.0144	3.0	2478	0.1295	1017368
0.11	4.0	3304	0.1386	1356744
0.0622	5.0	4130	0.1903	1694912
0.0012	6.0	4956	0.2808	2033992
0.0001	7.0	5782	0.3320	2372464
0.0563	8.0	6608	0.3093	2710624
0.0001	9.0	7434	0.3586	3049392
0.0	10.0	8260	0.3543	3388032
0.0004	11.0	9086	0.5125	3727312
0.0	12.0	9912	0.3661	4065504
0.0002	13.0	10738	0.4454	4404624
0.1018	14.0	11564	0.3821	4743080
0.0	15.0	12390	0.4532	5082240
0.0	16.0	13216	0.4530	5420840
0.0	17.0	14042	0.4589	5759384
0.0	18.0	14868	0.4664	6097784
0.0	19.0	15694	0.4699	6435544
0.0	20.0	16520	0.4702	6774288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_123_1760637678

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model