train_wsc_789_1760360866

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3413
  • Num Input Tokens Seen: 1462816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3493 1.504 188 0.3545 73440
0.3662 3.008 376 0.3573 147296
0.3506 4.5120 564 0.3427 221744
0.353 6.016 752 0.3442 293760
0.3303 7.52 940 0.3436 367808
0.3423 9.024 1128 0.3454 440288
0.3528 10.528 1316 0.3445 512864
0.3418 12.032 1504 0.3475 587120
0.3443 13.536 1692 0.3413 660352
0.3507 15.04 1880 0.3509 733072
0.3418 16.544 2068 0.3469 805824
0.3524 18.048 2256 0.3445 880160
0.3437 19.552 2444 0.3452 955488
0.3503 21.056 2632 0.3447 1028544
0.3525 22.56 2820 0.3506 1101536
0.353 24.064 3008 0.3454 1174032
0.3437 25.568 3196 0.3466 1247296
0.3593 27.072 3384 0.3474 1320640
0.3436 28.576 3572 0.3482 1394256

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760360866

Adapter
(2107)
this model

Evaluation results