train_cola_456_1768397597

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1872
  • Num Input Tokens Seen: 3463936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4604 0.5 1924 0.2828 173216
0.2691 1.0 3848 0.2345 346216
0.1579 1.5 5772 0.2443 519400
0.251 2.0 7696 0.1872 692896
0.1917 2.5 9620 0.2141 865952
0.0034 3.0 11544 0.2258 1039432
0.0041 3.5 13468 0.2401 1212696
0.2462 4.0 15392 0.2494 1385744
0.6043 4.5 17316 0.2615 1559200
0.002 5.0 19240 0.2415 1732008
0.1836 5.5 21164 0.2970 1905112
0.0029 6.0 23088 0.2535 2078472
0.0014 6.5 25012 0.2912 2251864
0.2213 7.0 26936 0.2719 2425080
0.0073 7.5 28860 0.2845 2597816
0.2082 8.0 30784 0.2876 2771400
0.0016 8.5 32708 0.2916 2945048
0.0077 9.0 34632 0.2925 3117888
0.0014 9.5 36556 0.2975 3291632
0.0022 10.0 38480 0.2972 3463936

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_456_1768397597

Adapter
(2376)
this model