train_stsb_789_1760637924

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4742
Num Input Tokens Seen: 8752512

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4693	1.0	1294	0.5494	437440
0.5117	2.0	2588	0.5542	875776
0.5168	3.0	3882	0.4961	1313392
0.3449	4.0	5176	0.4742	1753440
0.4587	5.0	6470	0.4819	2191488
0.3599	6.0	7764	0.4926	2629728
0.3841	7.0	9058	0.4917	3066368
0.5953	8.0	10352	0.4814	3502736
0.2795	9.0	11646	0.5007	3940080
0.3829	10.0	12940	0.5032	4376768
0.3479	11.0	14234	0.5258	4813216
0.3428	12.0	15528	0.5694	5251344
0.3635	13.0	16822	0.5847	5689568
0.2201	14.0	18116	0.6641	6127200
0.1532	15.0	19410	0.7580	6564352
0.2014	16.0	20704	0.8481	7002400
0.1675	17.0	21998	0.9621	7438560
0.1135	18.0	23292	1.0664	7876400
0.1349	19.0	24586	1.0785	8314032
0.1562	20.0	25880	1.0809	8752512

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_789_1760637924

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model