train_stsb_1756729599

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9687
  • Num Input Tokens Seen: 3924688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6321 0.5002 1294 1.0013 196128
0.5213 1.0004 2588 0.7792 392912
0.6033 1.5006 3882 0.7163 589232
0.7534 2.0008 5176 0.7121 785392
0.4464 2.5010 6470 0.6097 980992
0.7921 3.0012 7764 0.5557 1178208
0.4673 3.5014 9058 0.5789 1375792
0.3258 4.0015 10352 0.5703 1571200
0.4303 4.5017 11646 0.5369 1768912
0.4107 5.0019 12940 0.5237 1964080
0.4169 5.5021 14234 0.5375 2160080
0.3532 6.0023 15528 0.5197 2356800
0.2316 6.5025 16822 0.5939 2552912
0.1856 7.0027 18116 0.6062 2749840
0.2502 7.5029 19410 0.6583 2946160
0.2159 8.0031 20704 0.6375 3142224
0.1741 8.5033 21998 0.7739 3338752
0.1177 9.0035 23292 0.8405 3534272
0.2266 9.5037 24586 0.9639 3730128

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1756729599

Adapter
(2103)
this model

Evaluation results