train_cb_123_1760637639

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4447
  • Num Input Tokens Seen: 742296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2819 1.0 57 0.7586 37160
0.1652 2.0 114 0.2106 73720
0.1091 3.0 171 0.1681 110296
0.0807 4.0 228 0.1857 147784
0.0258 5.0 285 0.1615 184368
0.0447 6.0 342 0.2889 221536
0.0144 7.0 399 0.1307 258720
0.0113 8.0 456 0.1529 295408
0.0066 9.0 513 0.1519 332648
0.0137 10.0 570 0.1792 369976
0.0018 11.0 627 0.2539 406840
0.0023 12.0 684 0.2909 444728
0.0006 13.0 741 0.3705 481720
0.0008 14.0 798 0.3660 518664
0.0004 15.0 855 0.3678 555728
0.0002 16.0 912 0.3891 593096
0.0002 17.0 969 0.3793 629760
0.0002 18.0 1026 0.3887 667432
0.0003 19.0 1083 0.3914 704816
0.0001 20.0 1140 0.3933 742296

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1760637639

Adapter
(2100)
this model

Evaluation results