train_stsb_42_1760637584

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4528
  • Num Input Tokens Seen: 8733312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.0946 1.0 1294 1.0247 435696
0.4803 2.0 2588 0.6088 872288
0.5931 3.0 3882 0.5389 1309872
0.514 4.0 5176 0.5083 1747344
0.4347 5.0 6470 0.4943 2184032
0.4692 6.0 7764 0.4835 2622912
0.9412 7.0 9058 0.4763 3059648
0.4814 8.0 10352 0.4688 3496896
0.6544 9.0 11646 0.4671 3934000
0.3418 10.0 12940 0.4633 4369680
0.6307 11.0 14234 0.4590 4807440
0.3754 12.0 15528 0.4564 5243776
0.3837 13.0 16822 0.4555 5681232
0.44 14.0 18116 0.4559 6118000
0.3011 15.0 19410 0.4540 6554032
0.3321 16.0 20704 0.4546 6989408
0.4417 17.0 21998 0.4528 7425264
0.6429 18.0 23292 0.4534 7861712
0.3465 19.0 24586 0.4541 8297664
0.5448 20.0 25880 0.4539 8733312

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_42_1760637584

Adapter
(2187)
this model