train_stsb_123_1760637695

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4669
Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4021	1.0	1294	0.5392	435488
0.3671	2.0	2588	0.5130	871200
0.466	3.0	3882	0.4889	1307968
0.3487	4.0	5176	0.4957	1745568
0.5481	5.0	6470	0.4669	2182352
0.4735	6.0	7764	0.4755	2619888
0.3459	7.0	9058	0.4692	3057216
0.3002	8.0	10352	0.4894	3493600
0.3918	9.0	11646	0.4923	3928704
0.4135	10.0	12940	0.4905	4364240
0.4014	11.0	14234	0.4911	4800144
0.3477	12.0	15528	0.5155	5234320
0.2977	13.0	16822	0.5554	5670720
0.2721	14.0	18116	0.6021	6108240
0.1866	15.0	19410	0.6747	6543632
0.1905	16.0	20704	0.7631	6978816
0.2921	17.0	21998	0.8979	7415824
0.1222	18.0	23292	0.9814	7851088
0.1232	19.0	24586	0.9996	8287536
0.1856	20.0	25880	0.9975	8725024

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_123_1760637695

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model