train_wsc_123_1760637650

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3549
  • Num Input Tokens Seen: 869760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3727 2.0 222 0.4157 87304
0.4069 4.0 444 0.3672 174048
0.3393 6.0 666 0.3589 260840
0.3479 8.0 888 0.3499 347592
0.3523 10.0 1110 0.3512 434472
0.3383 12.0 1332 0.3526 521856
0.351 14.0 1554 0.3521 609008
0.3337 16.0 1776 0.3563 695024
0.3376 18.0 1998 0.3599 782016
0.3354 20.0 2220 0.3549 869760

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760637650

Adapter
(2107)
this model

Evaluation results