train_cola_42_1760637588

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2359
  • Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1875 1.0 1924 0.1632 366856
0.1215 2.0 3848 0.1599 734320
0.1298 3.0 5772 0.1463 1100800
0.1182 4.0 7696 0.1624 1467824
0.0837 5.0 9620 0.1449 1834632
0.0851 6.0 11544 0.1475 2202264
0.1248 7.0 13468 0.1509 2568880
0.0537 8.0 15392 0.1524 2935520
0.1729 9.0 17316 0.1479 3302192
0.0321 10.0 19240 0.1585 3668584
0.0463 11.0 21164 0.1689 4034712
0.0419 12.0 23088 0.1786 4401480
0.0523 13.0 25012 0.2057 4768408
0.0038 14.0 26936 0.2443 5135240
0.1481 15.0 28860 0.2262 5501784
0.0023 16.0 30784 0.2471 5868800
0.0065 17.0 32708 0.2818 6235472
0.0062 18.0 34632 0.2778 6601760
0.0028 19.0 36556 0.2833 6968720
0.0025 20.0 38480 0.2888 7336064

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_42_1760637588

Adapter
(2105)
this model

Evaluation results