train_wsc_42_1760637537

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3487
  • Num Input Tokens Seen: 985952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4341 1.0 125 0.5309 49104
0.3662 2.0 250 0.3506 98400
0.3839 3.0 375 0.3984 147712
0.3465 4.0 500 0.3456 196320
0.3797 5.0 625 0.3477 245520
0.3398 6.0 750 0.3491 294976
0.3475 7.0 875 0.3508 344320
0.3478 8.0 1000 0.3499 393840
0.3413 9.0 1125 0.3543 443168
0.359 10.0 1250 0.3474 492304
0.3569 11.0 1375 0.3524 541504
0.3462 12.0 1500 0.3518 590864
0.3429 13.0 1625 0.3499 640656
0.3358 14.0 1750 0.3460 689776
0.3672 15.0 1875 0.3513 739024
0.3471 16.0 2000 0.3508 788480
0.3369 17.0 2125 0.3467 837600
0.3458 18.0 2250 0.3458 887088
0.3577 19.0 2375 0.3504 936768
0.3583 20.0 2500 0.3469 985952

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760637537

Adapter
(2394)
this model