train_stsb_42_1760637582

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4286
  • Num Input Tokens Seen: 8733312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4185 1.0 1294 0.5002 435696
0.4029 2.0 2588 0.4535 872288
0.3791 3.0 3882 0.4301 1309872
0.4453 4.0 5176 0.4286 1747344
0.3487 5.0 6470 0.4533 2184032
0.2914 6.0 7764 0.5002 2622912
0.3652 7.0 9058 0.5530 3059648
0.2576 8.0 10352 0.6371 3496896
0.188 9.0 11646 0.7288 3934000
0.1067 10.0 12940 0.7938 4369680
0.2319 11.0 14234 0.8879 4807440
0.0631 12.0 15528 1.0584 5243776
0.071 13.0 16822 1.1869 5681232
0.0382 14.0 18116 1.4022 6118000
0.0258 15.0 19410 1.6405 6554032
0.046 16.0 20704 1.7531 6989408
0.0015 17.0 21998 1.8782 7425264
0.0568 18.0 23292 1.9005 7861712
0.0003 19.0 24586 1.9256 8297664
0.0015 20.0 25880 1.9278 8733312

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_42_1760637582

Adapter
(2124)
this model

Evaluation results