train_cb_123_1760637642

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1413
  • Num Input Tokens Seen: 742296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.1067 1.0 57 1.0792 37160
0.7267 2.0 114 0.5042 73720
0.1436 3.0 171 0.1623 110296
0.1234 4.0 228 0.1545 147784
0.0568 5.0 285 0.1538 184368
0.0804 6.0 342 0.1491 221536
0.0592 7.0 399 0.1491 258720
0.1143 8.0 456 0.1503 295408
0.0974 9.0 513 0.1472 332648
0.1432 10.0 570 0.1432 369976
0.0643 11.0 627 0.1445 406840
0.0506 12.0 684 0.1468 444728
0.145 13.0 741 0.1428 481720
0.2225 14.0 798 0.1425 518664
0.0918 15.0 855 0.1413 555728
0.0446 16.0 912 0.1435 593096
0.0862 17.0 969 0.1465 629760
0.0794 18.0 1026 0.1431 667432
0.1133 19.0 1083 0.1473 704816
0.0716 20.0 1140 0.1430 742296

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1760637642

Adapter
(2100)
this model

Evaluation results