train_sst2_456_1760637851

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8334
  • Num Input Tokens Seen: 67744848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.9669 1.0 15154 0.9100 3391536
0.9341 2.0 30308 0.8540 6778448
0.8366 3.0 45462 0.8535 10165856
0.7148 4.0 60616 0.8443 13553232
0.792 5.0 75770 0.8505 16942544
0.7039 6.0 90924 0.8441 20329264
0.774 7.0 106078 0.8334 23711968
0.7926 8.0 121232 0.8464 27102656
1.0991 9.0 136386 0.8358 30492336
1.0172 10.0 151540 0.8516 33879088
0.9327 11.0 166694 0.8446 37261968
0.746 12.0 181848 0.8395 40650384
0.8472 13.0 197002 0.8403 44037328
0.6921 14.0 212156 0.8517 47424688
0.8701 15.0 227310 0.8517 50810944
0.729 16.0 242464 0.8517 54197216
0.9737 17.0 257618 0.8517 57585488
0.9374 18.0 272772 0.8517 60973008
0.712 19.0 287926 0.8517 64361136
0.7503 20.0 303080 0.8517 67744848

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_456_1760637851

Adapter
(2107)
this model

Evaluation results