train_stsb_1754652141

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2252
  • Num Input Tokens Seen: 4364240

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
5.6389 0.5 647 5.8282 217472
2.801 1.0 1294 3.0180 435488
2.2038 1.5 1941 2.4164 652480
1.6601 2.0 2588 2.0665 871200
1.5049 2.5 3235 1.8074 1089120
1.7735 3.0 3882 1.6549 1307968
1.5476 3.5 4529 1.5325 1529024
1.1845 4.0 5176 1.4605 1745568
1.3133 4.5 5823 1.3826 1965984
2.0287 5.0 6470 1.3431 2182352
0.9574 5.5 7117 1.3103 2399760
1.3612 6.0 7764 1.2787 2619888
1.2128 6.5 8411 1.2617 2837808
0.989 7.0 9058 1.2460 3057216
1.2771 7.5 9705 1.2436 3275904
1.0137 8.0 10352 1.2304 3493600
1.3647 8.5 10999 1.2295 3712320
1.2201 9.0 11646 1.2252 3928704
0.8309 9.5 12293 1.2260 4147200
1.0145 10.0 12940 1.2264 4364240

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1754652141

Adapter
(2098)
this model

Evaluation results