train_stsb_1745333592

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4714
  • Num Input Tokens Seen: 54490336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4569 0.6182 200 0.5617 272576
0.4036 1.2349 400 0.4970 544096
0.4159 1.8532 600 0.4762 818048
0.3674 2.4699 800 0.4714 1089600
0.3631 3.0866 1000 0.4964 1361504
0.4957 3.7048 1200 0.4720 1636960
0.3589 4.3215 1400 0.4752 1909696
0.3039 4.9397 1600 0.4820 2182656
0.2508 5.5564 1800 0.5164 2453904
0.2841 6.1731 2000 0.5745 2727984
0.2835 6.7913 2200 0.5792 2999760
0.274 7.4080 2400 0.6362 3274528
0.2322 8.0247 2600 0.6387 3546880
0.2227 8.6430 2800 0.6578 3821184
0.2286 9.2597 3000 0.7486 4090704
0.2067 9.8779 3200 0.8339 4363696
0.1958 10.4946 3400 0.8124 4636656
0.1693 11.1113 3600 0.9280 4908928
0.1262 11.7295 3800 0.9389 5179040
0.1549 12.3462 4000 0.9239 5452192
0.2132 12.9645 4200 0.9526 5724448
0.1446 13.5811 4400 1.0284 5998032
0.0695 14.1978 4600 1.0495 6269792
0.1375 14.8161 4800 1.0655 6541248
0.078 15.4328 5000 1.1474 6815200
0.0972 16.0495 5200 1.0974 7086224
0.0876 16.6677 5400 1.1343 7360560
0.0737 17.2844 5600 1.1914 7632240
0.0922 17.9026 5800 1.2150 7904432
0.0528 18.5193 6000 1.2650 8177168
0.0873 19.1360 6200 1.2662 8449968
0.0412 19.7543 6400 1.3289 8722992
0.0556 20.3709 6600 1.2623 8996224
0.0754 20.9892 6800 1.3793 9269504
0.0347 21.6059 7000 1.3426 9542432
0.042 22.2226 7200 1.4143 9812704
0.0546 22.8408 7400 1.5186 10086272
0.0296 23.4575 7600 1.5051 10358832
0.0263 24.0742 7800 1.4748 10630000
0.0636 24.6924 8000 1.6275 10904880
0.0156 25.3091 8200 1.6776 11176208
0.0182 25.9274 8400 1.6289 11451344
0.0172 26.5440 8600 1.5225 11723328
0.0326 27.1607 8800 1.4928 11996224
0.0386 27.7790 9000 1.5682 12267520
0.0174 28.3957 9200 1.5614 12542064
0.0058 29.0124 9400 1.6234 12812048
0.0122 29.6306 9600 1.6083 13085264
0.017 30.2473 9800 1.6575 13356384
0.0569 30.8655 10000 1.6037 13629216
0.0071 31.4822 10200 1.7392 13902736
0.0182 32.0989 10400 1.7706 14174192
0.0128 32.7172 10600 1.7990 14448176
0.0224 33.3338 10800 1.7148 14718096
0.0137 33.9521 11000 1.9601 14992048
0.0299 34.5688 11200 1.7120 15265072
0.0115 35.1855 11400 1.9114 15538960
0.0228 35.8037 11600 1.7474 15812880
0.0042 36.4204 11800 1.8994 16082608
0.0115 37.0371 12000 1.8181 16357888
0.0052 37.6553 12200 1.8801 16627872
0.0179 38.2720 12400 1.7870 16900336
0.0135 38.8903 12600 1.8095 17175024
0.0089 39.5070 12800 1.8986 17446864
0.022 40.1236 13000 1.8708 17716560
0.0005 40.7419 13200 1.9029 17991792
0.0067 41.3586 13400 1.9110 18262992
0.0207 41.9768 13600 1.9045 18536880
0.0035 42.5935 13800 1.8726 18806784
0.0008 43.2102 14000 1.8700 19080608
0.0121 43.8284 14200 1.9780 19352320
0.0078 44.4451 14400 2.0454 19624544
0.0017 45.0618 14600 1.8867 19896064
0.0151 45.6801 14800 1.9560 20168064
0.0005 46.2968 15000 2.0652 20440208
0.0025 46.9150 15200 1.9374 20713296
0.0002 47.5317 15400 1.9338 20985744
0.0149 48.1484 15600 1.8699 21257920
0.0003 48.7666 15800 1.8767 21529248
0.0333 49.3833 16000 1.9783 21800992
0.0018 50.0 16200 1.9857 22073392
0.0007 50.6182 16400 2.0191 22345648
0.003 51.2349 16600 2.0862 22617984
0.0078 51.8532 16800 1.9632 22892544
0.0006 52.4699 17000 1.9896 23163488
0.0011 53.0866 17200 1.9739 23438320
0.0013 53.7048 17400 1.9409 23708720
0.0071 54.3215 17600 2.0538 23984304
0.0091 54.9397 17800 1.9404 24256368
0.0009 55.5564 18000 2.1186 24527040
0.0004 56.1731 18200 2.1091 24799312
0.0003 56.7913 18400 2.0837 25072848
0.0002 57.4080 18600 1.8460 25347056
0.0075 58.0247 18800 1.9662 25618400
0.003 58.6430 19000 2.0679 25892960
0.0 59.2597 19200 2.1113 26164688
0.0054 59.8779 19400 2.1105 26437392
0.0006 60.4946 19600 2.1133 26710176
0.0007 61.1113 19800 2.0320 26981728
0.0018 61.7295 20000 2.1244 27253632
0.0001 62.3462 20200 2.1269 27524928
0.0001 62.9645 20400 2.1260 27799712
0.0022 63.5811 20600 2.0921 28071024
0.0001 64.1978 20800 2.1965 28342880
0.0027 64.8161 21000 2.2244 28617696
0.0001 65.4328 21200 2.2629 28888112
0.0011 66.0495 21400 2.2187 29162944
0.0001 66.6677 21600 2.1747 29434784
0.0041 67.2844 21800 2.3100 29706800
0.0002 67.9026 22000 2.1084 29980240
0.0 68.5193 22200 2.2457 30250192
0.0 69.1360 22400 2.1829 30522672
0.0001 69.7543 22600 2.2905 30795024
0.0001 70.3709 22800 2.2212 31066544
0.0006 70.9892 23000 2.2090 31338128
0.0 71.6059 23200 2.2093 31609104
0.0015 72.2226 23400 2.2887 31881424
0.0 72.8408 23600 2.3235 32155024
0.0 73.4575 23800 2.3039 32425312
0.0001 74.0742 24000 2.3276 32698784
0.0002 74.6924 24200 2.3752 32974144
0.0 75.3091 24400 2.3228 33245216
0.0 75.9274 24600 2.3442 33517088
0.004 76.5440 24800 2.2757 33788432
0.0 77.1607 25000 2.3425 34060416
0.0024 77.7790 25200 2.4196 34333408
0.0003 78.3957 25400 2.4059 34605392
0.0 79.0124 25600 2.4064 34879536
0.0001 79.6306 25800 2.4741 35153488
0.0 80.2473 26000 2.4314 35424912
0.0 80.8655 26200 2.4374 35698064
0.0 81.4822 26400 2.4334 35968160
0.0 82.0989 26600 2.4134 36240928
0.0014 82.7172 26800 2.4487 36514208
0.0001 83.3338 27000 2.2631 36785136
0.0058 83.9521 27200 2.2900 37061648
0.0013 84.5688 27400 2.3149 37333648
0.0 85.1855 27600 2.3376 37605184
0.0 85.8037 27800 2.3399 37875360
0.0 86.4204 28000 2.3652 38150208
0.0 87.0371 28200 2.3257 38422048
0.0 87.6553 28400 2.3456 38692224
0.0 88.2720 28600 2.3551 38964176
0.0 88.8903 28800 2.3183 39235184
0.003 89.5070 29000 2.3283 39507520
0.0008 90.1236 29200 2.3474 39779328
0.0024 90.7419 29400 2.3467 40051520
0.0 91.3586 29600 2.3495 40322576
0.0 91.9768 29800 2.3609 40596016
0.0 92.5935 30000 2.3511 40867568
0.0 93.2102 30200 2.3741 41140848
0.0 93.8284 30400 2.3734 41412848
0.0 94.4451 30600 2.3721 41683920
0.0 95.0618 30800 2.3867 41959008
0.0 95.6801 31000 2.3931 42231520
0.0 96.2968 31200 2.3782 42502416
0.0 96.9150 31400 2.3857 42776304
0.0 97.5317 31600 2.3456 43048176
0.0 98.1484 31800 2.3855 43320144
0.0 98.7666 32000 2.3993 43591728
0.0022 99.3833 32200 2.4102 43866048
0.0 100.0 32400 2.4071 44137040
0.0 100.6182 32600 2.4218 44408848
0.0 101.2349 32800 2.4073 44682912
0.0 101.8532 33000 2.4291 44956000
0.0016 102.4699 33200 2.4205 45227824
0.0 103.0866 33400 2.4446 45498320
0.0 103.7048 33600 2.4527 45773648
0.0013 104.3215 33800 2.4299 46044128
0.0 104.9397 34000 2.4358 46317504
0.0 105.5564 34200 2.4432 46589024
0.0013 106.1731 34400 2.4415 46863680
0.0 106.7913 34600 2.4476 47135520
0.0 107.4080 34800 2.4460 47407056
0.0 108.0247 35000 2.4498 47680112
0.0 108.6430 35200 2.4551 47951632
0.0 109.2597 35400 2.4473 48224016
0.0 109.8779 35600 2.4494 48497072
0.0 110.4946 35800 2.4564 48768624
0.0 111.1113 36000 2.4651 49041488
0.0 111.7295 36200 2.4625 49314352
0.0 112.3462 36400 2.4648 49584848
0.0 112.9645 36600 2.4645 49858864
0.0 113.5811 36800 2.4657 50130000
0.0011 114.1978 37000 2.4632 50404128
0.0 114.8161 37200 2.4659 50678112
0.0 115.4328 37400 2.4675 50946800
0.0 116.0495 37600 2.4680 51219680
0.0 116.6677 37800 2.4695 51492544
0.0 117.2844 38000 2.4671 51764160
0.0 117.9026 38200 2.4622 52039488
0.0 118.5193 38400 2.4717 52311648
0.0 119.1360 38600 2.4669 52584960
0.0 119.7543 38800 2.4722 52855712
0.0 120.3709 39000 2.4722 53128480
0.0 120.9892 39200 2.4775 53401056
0.0 121.6059 39400 2.4729 53673600
0.0 122.2226 39600 2.4711 53943712
0.0 122.8408 39800 2.4697 54217344
0.0 123.4575 40000 2.4735 54490336

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1745333592

Adapter
(2100)
this model

Evaluation results