train_stsb_123_1760637700

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.5127
Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.1276	1.0	1294	1.1261	435488
0.5037	2.0	2588	0.6961	871200
0.6182	3.0	3882	0.6256	1307968
0.3773	4.0	5176	0.5923	1745568
0.6961	5.0	6470	0.5698	2182352
0.5883	6.0	7764	0.5536	2619888
0.4578	7.0	9058	0.5439	3057216
0.4402	8.0	10352	0.5379	3493600
0.4747	9.0	11646	0.5294	3928704
0.4438	10.0	12940	0.5248	4364240
0.4731	11.0	14234	0.5212	4800144
0.4504	12.0	15528	0.5178	5234320
0.5421	13.0	16822	0.5169	5670720
0.4152	14.0	18116	0.5145	6108240
0.3473	15.0	19410	0.5141	6543632
0.4726	16.0	20704	0.5141	6978816
0.4865	17.0	21998	0.5137	7415824
0.4733	18.0	23292	0.5127	7851088
0.3958	19.0	24586	0.5140	8287536
0.4228	20.0	25880	0.5127	8725024

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_stsb_123_1760637700

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model