train_sst2_789_1760637966

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0604
  • Num Input Tokens Seen: 67736640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1534 1.0 15154 0.0902 3386112
0.0187 2.0 30308 0.0714 6772240
0.026 3.0 45462 0.0645 10157776
0.0077 4.0 60616 0.0624 13544064
0.0369 5.0 75770 0.0619 16932736
0.0302 6.0 90924 0.0608 20317792
0.0747 7.0 106078 0.0618 23704608
0.0115 8.0 121232 0.0604 27091680
0.1018 9.0 136386 0.0635 30478800
0.0569 10.0 151540 0.0614 33864736
0.0607 11.0 166694 0.0623 37250256
0.0023 12.0 181848 0.0627 40636544
0.1343 13.0 197002 0.0638 44025456
0.0789 14.0 212156 0.0624 47407856
0.0208 15.0 227310 0.0641 50794368
0.0571 16.0 242464 0.0639 54183904
0.0365 17.0 257618 0.0643 57571280
0.0073 18.0 272772 0.0643 60960480
0.0996 19.0 287926 0.0642 64347696
0.0032 20.0 303080 0.0643 67736640

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_789_1760637966

Adapter
(2187)
this model