train_cb_1757340166

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3968
  • Num Input Tokens Seen: 621640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.187 1.0 113 0.5789 31088
0.4145 2.0 226 0.2959 61872
0.1423 3.0 339 0.2717 93016
0.3879 4.0 452 0.2208 124056
0.3181 5.0 565 0.1915 155240
0.3964 6.0 678 0.2505 185984
0.0106 7.0 791 0.3052 217192
0.1278 8.0 904 0.2324 248456
0.1891 9.0 1017 0.6020 279744
0.0002 10.0 1130 0.3493 310888
0.0001 11.0 1243 0.3753 341832
0.0001 12.0 1356 0.3776 372952
0.0001 13.0 1469 0.3861 403768
0.0 14.0 1582 0.3914 434704
0.0 15.0 1695 0.3901 466016
0.0001 16.0 1808 0.3942 497200
0.0 17.0 1921 0.3934 528320
0.0 18.0 2034 0.3991 559408
0.0 19.0 2147 0.3979 590544
0.0 20.0 2260 0.3968 621640

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_1757340166

Adapter
(2103)
this model

Evaluation results