train_stsb_123_1760637699

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7338
  • Num Input Tokens Seen: 8725024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
5.0545 1.0 1294 4.9284 435488
4.4727 2.0 2588 4.7829 871200
4.6158 3.0 3882 4.7574 1307968
4.5469 4.0 5176 4.7466 1745568
5.1639 5.0 6470 4.7506 2182352
4.6261 6.0 7764 4.7473 2619888
4.4961 7.0 9058 4.7588 3057216
4.4829 8.0 10352 4.7431 3493600
4.5859 9.0 11646 4.7455 3928704
4.7048 10.0 12940 4.7499 4364240
4.6222 11.0 14234 4.7493 4800144
4.6776 12.0 15528 4.7439 5234320
4.6117 13.0 16822 4.7338 5670720
4.7566 14.0 18116 4.7545 6108240
4.6339 15.0 19410 4.7451 6543632
4.4988 16.0 20704 4.7451 6978816
5.0551 17.0 21998 4.7451 7415824
4.5552 18.0 23292 4.7451 7851088
4.6251 19.0 24586 4.7451 8287536
4.689 20.0 25880 4.7451 8725024

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_123_1760637699

Adapter
(2103)
this model

Evaluation results