train_cb_42_1757596053

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1383
  • Num Input Tokens Seen: 621640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2227 1.0 113 0.6365 31088
0.467 2.0 226 0.3558 61872
0.4767 3.0 339 0.6339 93016
0.5349 4.0 452 0.2697 124056
0.3279 5.0 565 0.1693 155240
0.3209 6.0 678 0.1891 185984
0.0175 7.0 791 0.1852 217192
0.0084 8.0 904 0.1616 248456
0.085 9.0 1017 0.1374 279744
0.0003 10.0 1130 0.1618 310888
0.0001 11.0 1243 0.1426 341832
0.0001 12.0 1356 0.1332 372952
0.0001 13.0 1469 0.1352 403768
0.0 14.0 1582 0.1344 434704
0.0 15.0 1695 0.1344 466016
0.0 16.0 1808 0.1367 497200
0.0001 17.0 1921 0.1336 528320
0.0 18.0 2034 0.1321 559408
0.0 19.0 2147 0.1339 590544
0.0 20.0 2260 0.1383 621640

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_42_1757596053

Adapter
(2105)
this model

Evaluation results