train_cb_123_1760637640

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1409
  • Num Input Tokens Seen: 742296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2136 1.0 57 0.2027 37160
0.0799 2.0 114 0.1580 73720
0.1237 3.0 171 0.1409 110296
0.0303 4.0 228 0.1751 147784
0.0013 5.0 285 0.2135 184368
0.0001 6.0 342 0.2932 221536
0.0001 7.0 399 0.2914 258720
0.0 8.0 456 0.3060 295408
0.0 9.0 513 0.3096 332648
0.0001 10.0 570 0.3129 369976
0.0 11.0 627 0.3133 406840
0.0 12.0 684 0.3128 444728
0.0 13.0 741 0.3213 481720
0.0 14.0 798 0.3273 518664
0.0 15.0 855 0.3254 555728
0.0 16.0 912 0.3247 593096
0.0 17.0 969 0.3292 629760
0.0 18.0 1026 0.3255 667432
0.0 19.0 1083 0.3304 704816
0.0 20.0 1140 0.3308 742296

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1760637640

Adapter
(2101)
this model

Evaluation results