train_cola_789_1760637933

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1701
  • Num Input Tokens Seen: 7327648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2623 1.0 1924 0.1701 365728
0.2018 2.0 3848 0.1763 731984
0.3552 3.0 5772 0.1723 1098920
0.0101 4.0 7696 0.2101 1465464
0.0605 5.0 9620 0.2285 1831920
0.0009 6.0 11544 0.3140 2198176
0.0003 7.0 13468 0.3418 2564952
0.0053 8.0 15392 0.3419 2931096
0.0002 9.0 17316 0.3458 3296808
0.0012 10.0 19240 0.3297 3663512
0.0 11.0 21164 0.4454 4029608
0.0 12.0 23088 0.6117 4395616
0.0 13.0 25012 0.5573 4762456
0.0 14.0 26936 0.4571 5128712
0.0 15.0 28860 0.5523 5495008
0.0139 16.0 30784 0.5230 5861104
0.0 17.0 32708 0.5678 6228320
0.0153 18.0 34632 0.5942 6595032
0.0 19.0 36556 0.5975 6961416
0.0127 20.0 38480 0.6042 7327648

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_789_1760637933

Adapter
(2105)
this model

Evaluation results