train_stsb_1745333596

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the stsb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2390
  • Num Input Tokens Seen: 61177152

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2215 0.6182 200 0.2766 304960
0.1869 1.2349 400 0.2457 610112
0.1884 1.8532 600 0.2453 918240
0.1699 2.4699 800 0.2390 1223440
0.1532 3.0866 1000 0.2699 1529568
0.2172 3.7048 1200 0.2474 1838464
0.1394 4.3215 1400 0.2843 2144560
0.1236 4.9397 1600 0.2818 2450736
0.0963 5.5564 1800 0.3142 2755856
0.1151 6.1731 2000 0.3272 3063440
0.1245 6.7913 2200 0.3121 3368976
0.1221 7.4080 2400 0.3544 3677040
0.1022 8.0247 2600 0.3773 3983872
0.1021 8.6430 2800 0.3671 4292480
0.0961 9.2597 3000 0.3792 4594560
0.084 9.8779 3200 0.3962 4900544
0.0688 10.4946 3400 0.4515 5206928
0.0781 11.1113 3600 0.4372 5511472
0.0608 11.7295 3800 0.4621 5815280
0.0706 12.3462 4000 0.4677 6122240
0.0961 12.9645 4200 0.5260 6427616
0.0671 13.5811 4400 0.4477 6733776
0.0432 14.1978 4600 0.5179 7038848
0.0811 14.8161 4800 0.5361 7344384
0.0463 15.4328 5000 0.5241 7651280
0.0399 16.0495 5200 0.5834 7955504
0.043 16.6677 5400 0.5846 8262864
0.0356 17.2844 5600 0.6877 8568256
0.0302 17.9026 5800 0.5616 8873856
0.0297 18.5193 6000 0.5974 9180288
0.0294 19.1360 6200 0.6153 9486288
0.0462 19.7543 6400 0.6054 9792720
0.0317 20.3709 6600 0.6558 10100576
0.0253 20.9892 6800 0.5971 10406848
0.018 21.6059 7000 0.6393 10713296
0.0212 22.2226 7200 0.6546 11016800
0.0231 22.8408 7400 0.6386 11325536
0.0281 23.4575 7600 0.6449 11631392
0.0225 24.0742 7800 0.6947 11936144
0.0105 24.6924 8000 0.6953 12244560
0.006 25.3091 8200 0.7152 12549728
0.0079 25.9274 8400 0.7055 12858400
0.0138 26.5440 8600 0.6635 13163216
0.0125 27.1607 8800 0.7159 13469440
0.0105 27.7790 9000 0.7798 13774400
0.0118 28.3957 9200 0.7271 14082512
0.0082 29.0124 9400 0.7191 14385408
0.0063 29.6306 9600 0.6512 14692096
0.0026 30.2473 9800 0.7613 14996480
0.0057 30.8655 10000 0.7475 15302624
0.0049 31.4822 10200 0.7270 15609936
0.0036 32.0989 10400 0.7760 15915040
0.009 32.7172 10600 0.7272 16222112
0.0061 33.3338 10800 0.7148 16525360
0.005 33.9521 11000 0.7627 16833040
0.0073 34.5688 11200 0.7071 17138928
0.0037 35.1855 11400 0.7670 17446224
0.0036 35.8037 11600 0.7991 17754192
0.0148 36.4204 11800 0.7391 18056816
0.0013 37.0371 12000 0.7657 18365904
0.0018 37.6553 12200 0.7795 18669424
0.0053 38.2720 12400 0.7523 18975680
0.005 38.8903 12600 0.7990 19284128
0.0041 39.5070 12800 0.7616 19589440
0.0176 40.1236 13000 0.8185 19892304
0.0103 40.7419 13200 0.7812 20201904
0.0072 41.3586 13400 0.7441 20507296
0.0244 41.9768 13600 0.7297 20814240
0.0048 42.5935 13800 0.7664 21117472
0.0044 43.2102 14000 0.7791 21424352
0.0033 43.8284 14200 0.7869 21729344
0.0026 44.4451 14400 0.7786 22035168
0.0009 45.0618 14600 0.7504 22341904
0.0005 45.6801 14800 0.8331 22646640
0.004 46.2968 15000 0.7645 22952944
0.008 46.9150 15200 0.7493 23260240
0.0033 47.5317 15400 0.8067 23566048
0.0009 48.1484 15600 0.8254 23871504
0.0135 48.7666 15800 0.8062 24175696
0.0074 49.3833 16000 0.7800 24480832
0.0019 50.0 16200 0.8354 24786896
0.0072 50.6182 16400 0.8429 25092208
0.0036 51.2349 16600 0.7611 25398288
0.0092 51.8532 16800 0.8184 25707024
0.0065 52.4699 17000 0.8110 26010848
0.0008 53.0866 17200 0.7240 26319616
0.0038 53.7048 17400 0.8263 26623232
0.0015 54.3215 17600 0.7414 26932512
0.0012 54.9397 17800 0.7197 27238304
0.0055 55.5564 18000 0.7651 27542688
0.0107 56.1731 18200 0.7589 27848608
0.003 56.7913 18400 0.7462 28156128
0.0004 57.4080 18600 0.8039 28463824
0.0008 58.0247 18800 0.7704 28768304
0.0042 58.6430 19000 0.8495 29076400
0.0001 59.2597 19200 0.7385 29381968
0.0137 59.8779 19400 0.7544 29688144
0.0012 60.4946 19600 0.8019 29993744
0.0015 61.1113 19800 0.7491 30299024
0.0001 61.7295 20000 0.7923 30604816
0.0001 62.3462 20200 0.8275 30909520
0.0034 62.9645 20400 0.7559 31217744
0.0006 63.5811 20600 0.8041 31523296
0.0005 64.1978 20800 0.8322 31827424
0.0001 64.8161 21000 0.8417 32135904
0.0076 65.4328 21200 0.7773 32439120
0.0 66.0495 21400 0.7656 32747712
0.0012 66.6677 21600 0.8295 33052672
0.0002 67.2844 21800 0.7850 33358560
0.001 67.9026 22000 0.8000 33664736
0.0035 68.5193 22200 0.7884 33967392
0.0003 69.1360 22400 0.8099 34272592
0.0011 69.7543 22600 0.8397 34578896
0.0001 70.3709 22800 0.8535 34883440
0.0 70.9892 23000 0.8582 35188496
0.0001 71.6059 23200 0.8639 35492880
0.0007 72.2226 23400 0.9010 35798304
0.0013 72.8408 23600 0.8599 36105856
0.0 73.4575 23800 0.8108 36408816
0.0 74.0742 24000 0.8225 36716560
0.0 74.6924 24200 0.8360 37025168
0.0007 75.3091 24400 0.8470 37330368
0.0006 75.9274 24600 0.8409 37636736
0.0 76.5440 24800 0.8601 37941312
0.0003 77.1607 25000 0.8697 38246144
0.0007 77.7790 25200 0.8922 38552576
0.0 78.3957 25400 0.8864 38857104
0.0 79.0124 25600 0.8885 39165040
0.0001 79.6306 25800 0.8664 39472304
0.0 80.2473 26000 0.8746 39777616
0.0 80.8655 26200 0.8940 40084368
0.0 81.4822 26400 0.8350 40388032
0.0 82.0989 26600 0.8362 40694320
0.0007 82.7172 26800 0.8637 41001712
0.0 83.3338 27000 0.8523 41305200
0.0 83.9521 27200 0.8744 41615216
0.0008 84.5688 27400 0.8856 41920400
0.0 85.1855 27600 0.8884 42224944
0.0 85.8037 27800 0.8953 42528304
0.0012 86.4204 28000 0.9088 42836528
0.0 87.0371 28200 0.8782 43141440
0.0 87.6553 28400 0.8917 43445216
0.0019 88.2720 28600 0.8911 43750304
0.0002 88.8903 28800 0.8981 44055584
0.0099 89.5070 29000 0.9009 44361616
0.0003 90.1236 29200 0.8788 44665936
0.0009 90.7419 29400 0.8763 44972144
0.0 91.3586 29600 0.8834 45276416
0.0 91.9768 29800 0.8921 45583712
0.0 92.5935 30000 0.9097 45888688
0.0 93.2102 30200 0.9120 46195456
0.0 93.8284 30400 0.9113 46500288
0.0 94.4451 30600 0.9154 46804992
0.0 95.0618 30800 0.9218 47112576
0.0 95.6801 31000 0.9233 47418816
0.0 96.2968 31200 0.9242 47723232
0.0 96.9150 31400 0.9248 48029888
0.0 97.5317 31600 0.9271 48335504
0.0 98.1484 31800 0.9318 48640352
0.0 98.7666 32000 0.9343 48945632
0.0009 99.3833 32200 0.9283 49253952
0.0 100.0 32400 0.9351 49557760
0.0 100.6182 32600 0.9357 49863392
0.0002 101.2349 32800 0.9421 50171184
0.0 101.8532 33000 0.9420 50477424
0.0007 102.4699 33200 0.9419 50781472
0.0 103.0866 33400 0.9490 51085008
0.0 103.7048 33600 0.9524 51393296
0.0006 104.3215 33800 0.9612 51697808
0.0 104.9397 34000 0.9569 52004880
0.0 105.5564 34200 0.9602 52308944
0.0006 106.1731 34400 0.9639 52616512
0.0 106.7913 34600 0.9665 52921600
0.0 107.4080 34800 0.9698 53227040
0.0 108.0247 35000 0.9705 53533488
0.0 108.6430 35200 0.9735 53838704
0.0 109.2597 35400 0.9747 54143984
0.0 109.8779 35600 0.9762 54449808
0.0 110.4946 35800 0.9756 54754304
0.0 111.1113 36000 0.9765 55060864
0.0 111.7295 36200 0.9763 55367296
0.0 112.3462 36400 0.9785 55670672
0.0 112.9645 36600 0.9796 55978256
0.0 113.5811 36800 0.9791 56283024
0.0005 114.1978 37000 0.9796 56590928
0.0 114.8161 37200 0.9822 56897936
0.0 115.4328 37400 0.9825 57200192
0.0 116.0495 37600 0.9829 57505872
0.0 116.6677 37800 0.9841 57811120
0.0 117.2844 38000 0.9835 58116320
0.0 117.9026 38200 0.9840 58425376
0.0 118.5193 38400 0.9849 58732208
0.0 119.1360 38600 0.9852 59038688
0.0 119.7543 38800 0.9852 59342656
0.0 120.3709 39000 0.9853 59647664
0.0 120.9892 39200 0.9863 59954128
0.0 121.6059 39400 0.9867 60260256
0.0 122.2226 39600 0.9864 60563120
0.0 122.8408 39800 0.9864 60870320
0.0 123.4575 40000 0.9869 61177152

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_stsb_1745333596

Adapter
(540)
this model

Evaluation results