train_sst2_1752763923

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1764
  • Num Input Tokens Seen: 37274384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4867 0.5001 3789 0.4814 1865280
0.3415 1.0001 7578 0.3408 3725296
0.3341 1.5002 11367 0.2871 5592496
0.2259 2.0003 15156 0.2488 7451728
0.2091 2.5003 18945 0.2320 9312720
0.2085 3.0004 22734 0.2201 11179712
0.149 3.5005 26523 0.2128 13045888
0.1517 4.0005 30312 0.2055 14911072
0.2147 4.5006 34101 0.1990 16773344
0.1958 5.0007 37890 0.1934 18639616
0.2437 5.5007 41679 0.1902 20500224
0.2142 6.0008 45468 0.1873 22362704
0.1721 6.5009 49257 0.1836 24225104
0.1674 7.0009 53046 0.1807 26097984
0.1549 7.5010 56835 0.1792 27958656
0.2901 8.0011 60624 0.1775 29826656
0.1052 8.5011 64413 0.1771 31694624
0.2088 9.0012 68202 0.1767 33553312
0.2619 9.5013 71991 0.1764 35412768

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1752763923

Adapter
(2107)
this model

Evaluation results