train_stsb_789_1760637923

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.8524
Num Input Tokens Seen: 7789448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5844	2.0	2300	0.7524	778080
0.8271	4.0	4600	0.6845	1554056
0.5566	6.0	6900	0.6377	2334192
0.4163	8.0	9200	0.5724	3113448
0.3304	10.0	11500	0.5794	3892416
0.3531	12.0	13800	0.6039	4672776
0.3172	14.0	16100	0.6575	5451448
0.2619	16.0	18400	0.7460	6231920
0.2966	18.0	20700	0.8306	7011920
0.2114	20.0	23000	0.8524	7789448

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_stsb_789_1760637923

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model