train_cb_101112_1760637981

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1782
  • Num Input Tokens Seen: 723584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 101112
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4459 1.0 57 0.2854 36112
0.4375 2.0 114 0.2225 71552
0.3917 3.0 171 0.3163 108088
0.4173 4.0 228 0.1994 144720
0.1829 5.0 285 0.1971 181120
0.158 6.0 342 0.2243 217128
0.2394 7.0 399 0.1963 253536
0.2078 8.0 456 0.1828 290112
0.1586 9.0 513 0.1782 325872
0.1667 10.0 570 0.1894 361920
0.2652 11.0 627 0.1790 398432
0.2114 12.0 684 0.1851 435536
0.1488 13.0 741 0.1975 471520
0.2387 14.0 798 0.2111 507256
0.157 15.0 855 0.2199 543064
0.1676 16.0 912 0.2082 579704
0.1222 17.0 969 0.2069 615960
0.0839 18.0 1026 0.2193 652368
0.112 19.0 1083 0.2239 687976
0.0968 20.0 1140 0.2251 723584

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_101112_1760637981

Adapter
(2105)
this model

Evaluation results