train_stsb_123_1760637697

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4807
Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3793	1.0	1294	0.5439	435488
0.3894	2.0	2588	0.4949	871200
0.425	3.0	3882	0.4807	1307968
0.2995	4.0	5176	0.5303	1745568
0.4531	5.0	6470	0.5086	2182352
0.2905	6.0	7764	0.5703	2619888
0.2584	7.0	9058	0.6513	3057216
0.1933	8.0	10352	0.6799	3493600
0.1651	9.0	11646	0.7424	3928704
0.2668	10.0	12940	0.8582	4364240
0.1206	11.0	14234	0.9613	4800144
0.1187	12.0	15528	1.1003	5234320
0.1301	13.0	16822	1.3352	5670720
0.2357	14.0	18116	1.6198	6108240
0.0049	15.0	19410	1.8247	6543632
0.0435	16.0	20704	1.9786	6978816
0.0342	17.0	21998	2.0470	7415824
0.0007	18.0	23292	2.1490	7851088
0.0003	19.0	24586	2.1583	8287536
0.0002	20.0	25880	2.1685	8725024

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_123_1760637697

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model