train_cola_123_1760637706

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2137
  • Num Input Tokens Seen: 7337920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.7199 1.0 1924 1.3822 367320
1.2701 2.0 3848 1.2342 734600
1.6967 3.0 5772 1.2344 1101216
1.6043 4.0 7696 1.2395 1468552
1.1936 5.0 9620 1.2258 1834816
0.8882 6.0 11544 1.2179 2201584
0.8398 7.0 13468 1.2250 2568288
1.05 8.0 15392 1.2160 2935056
1.462 9.0 17316 1.2254 3301760
1.7389 10.0 19240 1.2256 3669168
0.8938 11.0 21164 1.2306 4036096
1.1366 12.0 23088 1.2173 4403128
1.146 13.0 25012 1.2324 4769264
1.0346 14.0 26936 1.2137 5136352
1.2975 15.0 28860 1.2231 5503048
1.3061 16.0 30784 1.2178 5869824
1.1825 17.0 32708 1.2238 6236752
1.1393 18.0 34632 1.2238 6603776
1.2381 19.0 36556 1.2238 6970736
1.0217 20.0 38480 1.2238 7337920

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_123_1760637706

Adapter
(2109)
this model

Evaluation results