train_stsb_42_1760637584

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4528
Num Input Tokens Seen: 8733312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.0946	1.0	1294	1.0247	435696
0.4803	2.0	2588	0.6088	872288
0.5931	3.0	3882	0.5389	1309872
0.514	4.0	5176	0.5083	1747344
0.4347	5.0	6470	0.4943	2184032
0.4692	6.0	7764	0.4835	2622912
0.9412	7.0	9058	0.4763	3059648
0.4814	8.0	10352	0.4688	3496896
0.6544	9.0	11646	0.4671	3934000
0.3418	10.0	12940	0.4633	4369680
0.6307	11.0	14234	0.4590	4807440
0.3754	12.0	15528	0.4564	5243776
0.3837	13.0	16822	0.4555	5681232
0.44	14.0	18116	0.4559	6118000
0.3011	15.0	19410	0.4540	6554032
0.3321	16.0	20704	0.4546	6989408
0.4417	17.0	21998	0.4528	7425264
0.6429	18.0	23292	0.4534	7861712
0.3465	19.0	24586	0.4541	8297664
0.5448	20.0	25880	0.4539	8733312

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_stsb_42_1760637584

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model