train_cb_123_1760637641

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9343
  • Num Input Tokens Seen: 742296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.1045 1.0 57 1.0770 37160
1.0563 2.0 114 1.0650 73720
1.2181 3.0 171 1.0362 110296
0.9715 4.0 228 1.0253 147784
0.9201 5.0 285 0.9905 184368
0.858 6.0 342 0.9656 221536
0.973 7.0 399 0.9565 258720
1.0439 8.0 456 0.9543 295408
1.0283 9.0 513 0.9408 332648
1.1097 10.0 570 0.9428 369976
0.8619 11.0 627 0.9417 406840
0.9263 12.0 684 0.9387 444728
1.0058 13.0 741 0.9346 481720
1.0511 14.0 798 0.9343 518664
0.9927 15.0 855 0.9363 555728
0.8666 16.0 912 0.9389 593096
0.7855 17.0 969 0.9405 629760
0.8856 18.0 1026 0.9403 667432
0.9045 19.0 1083 0.9556 704816
0.7929 20.0 1140 0.9411 742296

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1760637641

Adapter
(2101)
this model

Evaluation results