train_cola_42_1760637591

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1542
  • Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1344 1.0 1924 0.2083 366856
0.179 2.0 3848 0.1707 734320
0.1562 3.0 5772 0.1647 1100800
0.1315 4.0 7696 0.1599 1467824
0.0992 5.0 9620 0.1571 1834632
0.0625 6.0 11544 0.1609 2202264
0.1842 7.0 13468 0.1577 2568880
0.0837 8.0 15392 0.1569 2935520
0.1961 9.0 17316 0.1564 3302192
0.0907 10.0 19240 0.1542 3668584
0.1339 11.0 21164 0.1542 4034712
0.0926 12.0 23088 0.1545 4401480
0.1873 13.0 25012 0.1549 4768408
0.0718 14.0 26936 0.1564 5135240
0.1902 15.0 28860 0.1549 5501784
0.0978 16.0 30784 0.1547 5868800
0.1516 17.0 32708 0.1552 6235472
0.1028 18.0 34632 0.1552 6601760
0.1279 19.0 36556 0.1551 6968720
0.0436 20.0 38480 0.1549 7336064

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_42_1760637591

Adapter
(2158)
this model

Evaluation results