train_stsb_1754652141

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 1.2252
Num Input Tokens Seen: 4364240

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
5.6389	0.5	647	5.8282	217472
2.801	1.0	1294	3.0180	435488
2.2038	1.5	1941	2.4164	652480
1.6601	2.0	2588	2.0665	871200
1.5049	2.5	3235	1.8074	1089120
1.7735	3.0	3882	1.6549	1307968
1.5476	3.5	4529	1.5325	1529024
1.1845	4.0	5176	1.4605	1745568
1.3133	4.5	5823	1.3826	1965984
2.0287	5.0	6470	1.3431	2182352
0.9574	5.5	7117	1.3103	2399760
1.3612	6.0	7764	1.2787	2619888
1.2128	6.5	8411	1.2617	2837808
0.989	7.0	9058	1.2460	3057216
1.2771	7.5	9705	1.2436	3275904
1.0137	8.0	10352	1.2304	3493600
1.3647	8.5	10999	1.2295	3712320
1.2201	9.0	11646	1.2252	3928704
0.8309	9.5	12293	1.2260	4147200
1.0145	10.0	12940	1.2264	4364240

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1754652141

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2406)

this model