train_stsb_1753094149

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.5051
Num Input Tokens Seen: 4364240

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8612	0.5	647	0.9440	217472
0.5541	1.0	1294	0.6531	435488
0.491	1.5	1941	0.6016	652480
0.4362	2.0	2588	0.5734	871200
0.39	2.5	3235	0.5508	1089120
0.5299	3.0	3882	0.5401	1307968
0.4651	3.5	4529	0.5306	1529024
0.3295	4.0	5176	0.5326	1745568
0.4295	4.5	5823	0.5161	1965984
0.6499	5.0	6470	0.5128	2182352
0.3838	5.5	7117	0.5134	2399760
0.5288	6.0	7764	0.5089	2619888
0.4526	6.5	8411	0.5074	2837808
0.3872	7.0	9058	0.5079	3057216
0.3406	7.5	9705	0.5085	3275904
0.408	8.0	10352	0.5076	3493600
0.448	8.5	10999	0.5083	3712320
0.4183	9.0	11646	0.5051	3928704
0.3629	9.5	12293	0.5055	4147200
0.4195	10.0	12940	0.5057	4364240

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.1+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1753094149

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model