train_cb_42_1760637523

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1325
  • Num Input Tokens Seen: 725992

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6057 1.0 57 1.3882 36480
0.3279 2.0 114 0.3234 72112
0.2478 3.0 171 0.2578 108712
0.1393 4.0 228 0.2197 145296
0.2443 5.0 285 0.1910 181408
0.1904 6.0 342 0.2125 217760
0.092 7.0 399 0.1999 254568
0.285 8.0 456 0.1869 291000
0.1165 9.0 513 0.2022 327792
0.1643 10.0 570 0.2083 363864
0.1546 11.0 627 0.1881 400104
0.3025 12.0 684 0.1756 436440
0.1343 13.0 741 0.1456 471944
0.1078 14.0 798 0.2014 508424
0.2187 15.0 855 0.1325 545352
0.0431 16.0 912 0.1589 581368
0.0308 17.0 969 0.1602 616776
0.0234 18.0 1026 0.1673 653152
0.0289 19.0 1083 0.1704 689856
0.0121 20.0 1140 0.1693 725992

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_42_1760637523

Adapter
(2126)
this model

Evaluation results