train_sst2_123_1760637734

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0571
  • Num Input Tokens Seen: 67743008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3935 1.0 15154 0.3437 3385616
0.3375 2.0 30308 0.3049 6774096
0.0222 3.0 45462 0.0723 10161824
0.1164 4.0 60616 0.0618 13549104
0.0294 5.0 75770 0.0706 16935568
0.0576 6.0 90924 0.0597 20320896
0.1461 7.0 106078 0.0611 23709008
0.0119 8.0 121232 0.0598 27099520
0.0388 9.0 136386 0.0621 30484864
0.0493 10.0 151540 0.0579 33869824
0.0499 11.0 166694 0.0571 37256608
0.0166 12.0 181848 0.0581 40640592
0.0158 13.0 197002 0.0614 44027424
0.0245 14.0 212156 0.0599 47415744
0.0289 15.0 227310 0.0603 50803600
0.0623 16.0 242464 0.0607 54189408
0.051 17.0 257618 0.0608 57578368
0.053 18.0 272772 0.0608 60965904
0.1471 19.0 287926 0.0607 64353008
0.0071 20.0 303080 0.0608 67743008

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_123_1760637734

Adapter
(2106)
this model

Evaluation results