train_stsb_1755694490

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0312
  • Num Input Tokens Seen: 3924688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6558 0.5002 1294 0.9696 196128
0.4994 1.0004 2588 0.8021 392912
0.5987 1.5006 3882 0.7167 589232
0.6127 2.0008 5176 0.6759 785392
0.4299 2.5010 6470 0.5886 980992
0.7941 3.0012 7764 0.5592 1178208
0.3966 3.5014 9058 0.5541 1375792
0.3103 4.0015 10352 0.5729 1571200
0.4157 4.5017 11646 0.5512 1768912
0.409 5.0019 12940 0.5306 1964080
0.3542 5.5021 14234 0.5264 2160080
0.3236 6.0023 15528 0.5257 2356800
0.2516 6.5025 16822 0.6130 2552912
0.1632 7.0027 18116 0.6006 2749840
0.343 7.5029 19410 0.7188 2946160
0.1477 8.0031 20704 0.7346 3142224
0.1178 8.5033 21998 0.8293 3338752
0.0911 9.0035 23292 0.8598 3534272
0.1158 9.5037 24586 1.0161 3730128

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1755694490

Adapter
(2101)
this model

Evaluation results