train_wic_1745950294

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2148
  • Num Input Tokens Seen: 12845616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6692 0.1637 200 0.2914 64080
0.4023 0.3275 400 0.2351 128048
0.226 0.4912 600 0.2466 192224
0.2436 0.6549 800 0.2148 256832
0.1626 0.8187 1000 0.2215 321264
0.4275 0.9824 1200 0.2451 385728
0.1906 1.1457 1400 0.2486 449768
0.2749 1.3095 1600 0.2267 514072
0.1241 1.4732 1800 0.2585 578408
0.0982 1.6369 2000 0.2912 642248
0.2176 1.8007 2200 0.3498 706488
0.2567 1.9644 2400 0.2479 770888
0.1621 2.1277 2600 0.3608 835216
0.0006 2.2914 2800 0.5137 899312
0.0221 2.4552 3000 0.3600 963696
0.1225 2.6189 3200 0.4991 1027904
0.0007 2.7826 3400 0.4534 1092016
0.423 2.9464 3600 0.5087 1156240
0.0015 3.1097 3800 0.4305 1220568
0.0003 3.2734 4000 0.6379 1285128
0.0006 3.4372 4200 0.4382 1349032
0.0011 3.6009 4400 0.4092 1413096
0.0004 3.7646 4600 0.5039 1477816
0.1106 3.9284 4800 0.4148 1541800
0.0002 4.0917 5000 0.6128 1605480
0.0001 4.2554 5200 0.8490 1669464
0.2408 4.4192 5400 0.5833 1733528
0.0006 4.5829 5600 0.5035 1797608
0.056 4.7466 5800 0.5812 1862328
0.001 4.9104 6000 0.5246 1926824
0.0001 5.0737 6200 0.6962 1990752
0.0 5.2374 6400 0.7806 2055200
0.0 5.4011 6600 0.7270 2119232
0.0 5.5649 6800 0.7609 2183440
0.0 5.7286 7000 0.8746 2247920
0.0012 5.8923 7200 0.5500 2312032
0.0004 6.0557 7400 0.6073 2376200
0.0005 6.2194 7600 0.6436 2440472
0.0005 6.3831 7800 0.5694 2504760
0.0674 6.5469 8000 0.7587 2568840
0.0003 6.7106 8200 0.5694 2632776
0.0 6.8743 8400 0.9750 2697176
0.0004 7.0377 8600 0.5806 2761240
0.0 7.2014 8800 0.7086 2825240
0.0001 7.3651 9000 0.6857 2889368
0.0 7.5289 9200 0.7895 2953752
0.0018 7.6926 9400 0.6545 3018440
0.0 7.8563 9600 0.7148 3082552
0.0 8.0196 9800 0.8599 3146472
0.0 8.1834 10000 0.8023 3211320
0.0 8.3471 10200 0.9938 3275192
0.0014 8.5108 10400 1.0138 3339400
0.0 8.6746 10600 0.5833 3403656
0.0005 8.8383 10800 0.5806 3467848
0.1626 9.0016 11000 0.6367 3531952
0.0 9.1654 11200 0.6516 3596368
0.0001 9.3291 11400 0.6904 3660496
0.001 9.4928 11600 0.7195 3724480
0.0 9.6566 11800 0.8818 3788928
0.0001 9.8203 12000 0.7360 3853296
0.0001 9.9840 12200 0.6547 3917232
0.0 10.1474 12400 0.7772 3981568
0.0 10.3111 12600 0.7553 4045600
0.0 10.4748 12800 0.6771 4110048
0.0001 10.6386 13000 0.6039 4174432
0.0 10.8023 13200 0.6869 4238512
0.0001 10.9660 13400 0.5188 4302800
0.0 11.1293 13600 0.7701 4366728
0.2328 11.2931 13800 0.7897 4431112
0.0763 11.4568 14000 0.6923 4495320
0.0 11.6205 14200 0.7449 4559336
0.0 11.7843 14400 0.8729 4623464
0.0 11.9480 14600 0.6830 4687880
0.0 12.1113 14800 0.6814 4752088
0.2406 12.2751 15000 0.6356 4816376
0.2969 12.4388 15200 0.7597 4881000
0.0 12.6025 15400 0.8125 4944776
0.0 12.7663 15600 0.8740 5009528
0.0 12.9300 15800 0.7733 5073448
0.0 13.0933 16000 0.9949 5137696
0.0 13.2571 16200 1.0131 5202256
0.0 13.4208 16400 0.6984 5266128
0.0 13.5845 16600 0.7279 5330256
0.0 13.7483 16800 0.7839 5395072
0.0 13.9120 17000 0.8675 5458672
0.0 14.0753 17200 0.8122 5522480
0.0 14.2391 17400 0.9484 5586480
0.0 14.4028 17600 0.6329 5650208
0.0001 14.5665 17800 0.7576 5714704
0.0 14.7302 18000 0.8083 5779488
0.0 14.8940 18200 0.8644 5843728
0.0 15.0573 18400 0.8854 5908152
0.0 15.2210 18600 0.9064 5972168
0.0 15.3848 18800 0.9408 6037144
0.0 15.5485 19000 0.9505 6101800
0.0 15.7122 19200 1.0798 6165416
0.0 15.8760 19400 0.9290 6229672
0.0 16.0393 19600 0.9027 6293504
0.0 16.2030 19800 1.0223 6357840
0.0 16.3668 20000 0.8874 6422352
0.0 16.5305 20200 0.7525 6486352
0.0004 16.6942 20400 0.7276 6550928
0.0 16.8580 20600 0.7255 6615008
0.0 17.0213 20800 0.8213 6678864
0.0 17.1850 21000 0.7005 6743040
0.0001 17.3488 21200 0.7398 6807664
0.0 17.5125 21400 0.6896 6871648
0.0001 17.6762 21600 0.7906 6936048
0.0 17.8400 21800 0.8672 7000448
0.0001 18.0033 22000 0.7026 7064224
0.0 18.1670 22200 0.7652 7128848
0.0 18.3307 22400 0.8071 7192992
0.0 18.4945 22600 0.8375 7256624
0.0 18.6582 22800 0.8745 7321520
0.0 18.8219 23000 0.8595 7385552
0.0 18.9857 23200 0.8643 7449600
0.0 19.1490 23400 0.9584 7513504
0.0 19.3127 23600 0.8738 7577776
0.0 19.4765 23800 0.8654 7642048
0.0 19.6402 24000 0.8714 7706720
0.0 19.8039 24200 0.8997 7770896
0.0 19.9677 24400 0.9527 7835136
0.0 20.1310 24600 0.9663 7899176
0.0 20.2947 24800 0.9910 7963800
0.0 20.4585 25000 0.8062 8028584
0.0 20.6222 25200 0.8622 8092616
0.0 20.7859 25400 0.8875 8157000
0.0 20.9497 25600 0.9077 8220920
0.0 21.1130 25800 0.9285 8284832
0.0 21.2767 26000 0.9428 8348832
0.0 21.4404 26200 0.9593 8412992
0.0 21.6042 26400 0.9728 8476944
0.0 21.7679 26600 0.9867 8541536
0.0 21.9316 26800 0.9971 8606128
0.0 22.0950 27000 1.0116 8670264
0.0 22.2587 27200 1.0208 8734456
0.0 22.4224 27400 1.0317 8798776
0.0 22.5862 27600 1.0455 8862888
0.0 22.7499 27800 1.0536 8927464
0.0 22.9136 28000 1.0604 8991912
0.0 23.0770 28200 1.0708 9055920
0.0 23.2407 28400 1.0810 9120064
0.0 23.4044 28600 1.0924 9184496
0.0 23.5682 28800 1.0984 9248672
0.0 23.7319 29000 1.1076 9312880
0.0 23.8956 29200 1.1157 9377264
0.0 24.0589 29400 1.1211 9441584
0.0 24.2227 29600 1.1321 9505936
0.0 24.3864 29800 1.1395 9570272
0.0 24.5501 30000 1.1485 9634480
0.0 24.7139 30200 1.1515 9698784
0.0 24.8776 30400 1.1655 9762800
0.0 25.0409 30600 1.1697 9826744
0.0 25.2047 30800 1.1809 9890760
0.0 25.3684 31000 1.1825 9955112
0.0 25.5321 31200 1.1917 10019448
0.0 25.6959 31400 1.1961 10083848
0.0 25.8596 31600 1.2004 10147752
0.0 26.0229 31800 1.2097 10211912
0.0 26.1867 32000 1.2223 10275928
0.0 26.3504 32200 1.2190 10340168
0.0 26.5141 32400 1.2255 10404376
0.0 26.6779 32600 1.2313 10469048
0.0 26.8416 32800 1.2337 10533640
0.0 27.0049 33000 1.2444 10597888
0.0 27.1686 33200 1.2534 10662240
0.0 27.3324 33400 1.2535 10726640
0.0 27.4961 33600 1.2555 10790608
0.0 27.6598 33800 1.2596 10854688
0.0 27.8236 34000 1.2657 10919360
0.0 27.9873 34200 1.2708 10983664
0.0 28.1506 34400 1.2678 11047464
0.0 28.3144 34600 1.2721 11111848
0.0 28.4781 34800 1.2790 11176376
0.0 28.6418 35000 1.2825 11241256
0.0 28.8056 35200 1.2908 11305112
0.0 28.9693 35400 1.2937 11369464
0.0 29.1326 35600 1.2896 11433608
0.0 29.2964 35800 1.2961 11497944
0.0 29.4601 36000 1.2947 11562200
0.0 29.6238 36200 1.3045 11626152
0.0 29.7876 36400 1.3039 11690824
0.0 29.9513 36600 1.2985 11755016
0.0 30.1146 36800 1.3052 11818880
0.0 30.2783 37000 1.3122 11882768
0.0 30.4421 37200 1.3068 11946912
0.0 30.6058 37400 1.3125 12011696
0.0 30.7695 37600 1.3085 12075664
0.0 30.9333 37800 1.3130 12139680
0.0 31.0966 38000 1.3178 12204000
0.0 31.2603 38200 1.3148 12268800
0.0 31.4241 38400 1.3161 12333024
0.0 31.5878 38600 1.3151 12396976
0.0 31.7515 38800 1.3151 12461104
0.0 31.9153 39000 1.3186 12524768
0.0 32.0786 39200 1.3144 12588496
0.0 32.2423 39400 1.3118 12653136
0.0 32.4061 39600 1.3132 12717328
0.0 32.5698 39800 1.3182 12781536
0.0 32.7335 40000 1.3186 12845616

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950294

Adapter
(540)
this model

Evaluation results