train_stsb_123_1760637697

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4807
  • Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3793 1.0 1294 0.5439 435488
0.3894 2.0 2588 0.4949 871200
0.425 3.0 3882 0.4807 1307968
0.2995 4.0 5176 0.5303 1745568
0.4531 5.0 6470 0.5086 2182352
0.2905 6.0 7764 0.5703 2619888
0.2584 7.0 9058 0.6513 3057216
0.1933 8.0 10352 0.6799 3493600
0.1651 9.0 11646 0.7424 3928704
0.2668 10.0 12940 0.8582 4364240
0.1206 11.0 14234 0.9613 4800144
0.1187 12.0 15528 1.1003 5234320
0.1301 13.0 16822 1.3352 5670720
0.2357 14.0 18116 1.6198 6108240
0.0049 15.0 19410 1.8247 6543632
0.0435 16.0 20704 1.9786 6978816
0.0342 17.0 21998 2.0470 7415824
0.0007 18.0 23292 2.1490 7851088
0.0003 19.0 24586 2.1583 8287536
0.0002 20.0 25880 2.1685 8725024

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_123_1760637697

Adapter
(2101)
this model

Evaluation results