train_mrpc_789_1760637904

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Loss: 0.9068
Num Input Tokens Seen: 6022800

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1672	2.0	1468	0.1810	601472
0.1098	4.0	2936	0.1778	1203088
0.1348	6.0	4404	0.1687	1805200
0.0734	8.0	5872	0.2441	2408032
0.0588	10.0	7340	0.4551	3010208
0.1257	12.0	8808	0.6788	3611504
0.0	14.0	10276	0.8971	4214112
0.0	16.0	11744	0.8704	4817488
0.0	18.0	13212	0.8940	5420544
0.0	20.0	14680	0.9068	6022800

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_mrpc_789_1760637904

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model