train_stsb_42_1760637582

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4286
Num Input Tokens Seen: 8733312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4185	1.0	1294	0.5002	435696
0.4029	2.0	2588	0.4535	872288
0.3791	3.0	3882	0.4301	1309872
0.4453	4.0	5176	0.4286	1747344
0.3487	5.0	6470	0.4533	2184032
0.2914	6.0	7764	0.5002	2622912
0.3652	7.0	9058	0.5530	3059648
0.2576	8.0	10352	0.6371	3496896
0.188	9.0	11646	0.7288	3934000
0.1067	10.0	12940	0.7938	4369680
0.2319	11.0	14234	0.8879	4807440
0.0631	12.0	15528	1.0584	5243776
0.071	13.0	16822	1.1869	5681232
0.0382	14.0	18116	1.4022	6118000
0.0258	15.0	19410	1.6405	6554032
0.046	16.0	20704	1.7531	6989408
0.0015	17.0	21998	1.8782	7425264
0.0568	18.0	23292	1.9005	7861712
0.0003	19.0	24586	1.9256	8297664
0.0015	20.0	25880	1.9278	8733312

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_42_1760637582

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model