train_stsb_789_1760637923

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8524
  • Num Input Tokens Seen: 7789448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5844 2.0 2300 0.7524 778080
0.8271 4.0 4600 0.6845 1554056
0.5566 6.0 6900 0.6377 2334192
0.4163 8.0 9200 0.5724 3113448
0.3304 10.0 11500 0.5794 3892416
0.3531 12.0 13800 0.6039 4672776
0.3172 14.0 16100 0.6575 5451448
0.2619 16.0 18400 0.7460 6231920
0.2966 18.0 20700 0.8306 7011920
0.2114 20.0 23000 0.8524 7789448

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_789_1760637923

Adapter
(2105)
this model

Evaluation results