train_sst2_42_1760637622

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8224
  • Num Input Tokens Seen: 67768656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.8503 1.0 15154 0.8988 3383904
0.7166 2.0 30308 0.8461 6774480
0.7149 3.0 45462 0.8269 10163152
0.8593 4.0 60616 0.8305 13552544
0.7253 5.0 75770 0.8279 16941120
0.7993 6.0 90924 0.8240 20331520
0.6568 7.0 106078 0.8246 23721840
0.5681 8.0 121232 0.8233 27111168
0.8977 9.0 136386 0.8304 30498912
0.9404 10.0 151540 0.8224 33886240
0.7244 11.0 166694 0.8290 37277536
0.6772 12.0 181848 0.8292 40664576
0.9906 13.0 197002 0.8227 44053792
0.6141 14.0 212156 0.8283 47441984
0.8149 15.0 227310 0.8283 50830480
0.8214 16.0 242464 0.8283 54215696
0.7346 17.0 257618 0.8283 57603296
0.8047 18.0 272772 0.8283 60987568
0.9349 19.0 287926 0.8283 64378192
0.604 20.0 303080 0.8283 67768656

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_42_1760637622

Adapter
(2106)
this model

Evaluation results