train_stsb_42_1760637580

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4216
  • Num Input Tokens Seen: 8733312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4747 1.0 1294 0.4854 435696
0.3948 2.0 2588 0.4649 872288
0.3869 3.0 3882 0.4333 1309872
0.4917 4.0 5176 0.4449 1747344
0.3777 5.0 6470 0.4258 2184032
0.3456 6.0 7764 0.4216 2622912
0.536 7.0 9058 0.4239 3059648
0.3693 8.0 10352 0.4315 3496896
0.3353 9.0 11646 0.4438 3934000
0.3025 10.0 12940 0.4474 4369680
0.3997 11.0 14234 0.4730 4807440
0.2902 12.0 15528 0.4890 5243776
0.2457 13.0 16822 0.5415 5681232
0.2338 14.0 18116 0.5830 6118000
0.1754 15.0 19410 0.6711 6554032
0.1299 16.0 20704 0.7878 6989408
0.1662 17.0 21998 0.9459 7425264
0.1772 18.0 23292 1.0427 7861712
0.0643 19.0 24586 1.0606 8297664
0.1862 20.0 25880 1.0602 8733312

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_42_1760637580

Adapter
(2186)
this model