train_stsb_456_1760637811

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 1.3619
Num Input Tokens Seen: 8714656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4481	1.0	1294	0.5405	435104
0.4266	2.0	2588	0.5010	870112
0.4134	3.0	3882	0.4435	1305024
0.4529	4.0	5176	0.4423	1742048
0.4069	5.0	6470	0.4360	2176672
0.4466	6.0	7764	0.4350	2613648
0.428	7.0	9058	0.4368	3049776
0.3933	8.0	10352	0.4303	3486928
0.2913	9.0	11646	0.4325	3924192
0.2852	10.0	12940	0.4348	4360736
0.3983	11.0	14234	0.4353	4793520
0.2878	12.0	15528	0.4442	5230528
0.3392	13.0	16822	0.4607	5664848
0.3391	14.0	18116	0.4706	6100288
0.2599	15.0	19410	0.4924	6534240
0.3084	16.0	20704	0.5064	6969936
0.2677	17.0	21998	0.5281	7405056
0.2314	18.0	23292	0.5406	7842624
0.2938	19.0	24586	0.5536	8279952
0.2855	20.0	25880	0.5557	8714656

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_456_1760637811

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model