train_stsb_123_1760637699

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 4.7338
Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
5.0545	1.0	1294	4.9284	435488
4.4727	2.0	2588	4.7829	871200
4.6158	3.0	3882	4.7574	1307968
4.5469	4.0	5176	4.7466	1745568
5.1639	5.0	6470	4.7506	2182352
4.6261	6.0	7764	4.7473	2619888
4.4961	7.0	9058	4.7588	3057216
4.4829	8.0	10352	4.7431	3493600
4.5859	9.0	11646	4.7455	3928704
4.7048	10.0	12940	4.7499	4364240
4.6222	11.0	14234	4.7493	4800144
4.6776	12.0	15528	4.7439	5234320
4.6117	13.0	16822	4.7338	5670720
4.7566	14.0	18116	4.7545	6108240
4.6339	15.0	19410	4.7451	6543632
4.4988	16.0	20704	4.7451	6978816
5.0551	17.0	21998	4.7451	7415824
4.5552	18.0	23292	4.7451	7851088
4.6251	19.0	24586	4.7451	8287536
4.689	20.0	25880	4.7451	8725024

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_123_1760637699

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model