train_stsb_123_1760637700

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5127
  • Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.1276 1.0 1294 1.1261 435488
0.5037 2.0 2588 0.6961 871200
0.6182 3.0 3882 0.6256 1307968
0.3773 4.0 5176 0.5923 1745568
0.6961 5.0 6470 0.5698 2182352
0.5883 6.0 7764 0.5536 2619888
0.4578 7.0 9058 0.5439 3057216
0.4402 8.0 10352 0.5379 3493600
0.4747 9.0 11646 0.5294 3928704
0.4438 10.0 12940 0.5248 4364240
0.4731 11.0 14234 0.5212 4800144
0.4504 12.0 15528 0.5178 5234320
0.5421 13.0 16822 0.5169 5670720
0.4152 14.0 18116 0.5145 6108240
0.3473 15.0 19410 0.5141 6543632
0.4726 16.0 20704 0.5141 6978816
0.4865 17.0 21998 0.5137 7415824
0.4733 18.0 23292 0.5127 7851088
0.3958 19.0 24586 0.5140 8287536
0.4228 20.0 25880 0.5127 8725024

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_123_1760637700

Adapter
(2101)
this model

Evaluation results