train_wic_1745950295

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1765
  • Num Input Tokens Seen: 12845616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
3.9958 0.1637 200 3.3872 64080
3.5157 0.3275 400 3.2954 128048
4.3392 0.4912 600 3.2764 192224
3.7731 0.6549 800 3.2530 256832
4.8833 0.8187 1000 3.2281 321264
4.3418 0.9824 1200 3.2445 385728
2.8185 1.1457 1400 3.2374 449768
4.5054 1.3095 1600 3.1879 514072
3.5401 1.4732 1800 3.2108 578408
3.8598 1.6369 2000 3.2438 642248
4.6184 1.8007 2200 3.2463 706488
1.636 1.9644 2400 3.2159 770888
4.2425 2.1277 2600 3.2043 835216
4.4237 2.2914 2800 3.2014 899312
2.0277 2.4552 3000 3.1808 963696
2.9487 2.6189 3200 3.2460 1027904
2.511 2.7826 3400 3.2238 1092016
4.6544 2.9464 3600 3.2033 1156240
2.285 3.1097 3800 3.1921 1220568
2.4531 3.2734 4000 3.2192 1285128
3.4836 3.4372 4200 3.2053 1349032
3.4662 3.6009 4400 3.1977 1413096
3.1351 3.7646 4600 3.2280 1477816
3.7616 3.9284 4800 3.2018 1541800
3.2064 4.0917 5000 3.2130 1605480
3.6003 4.2554 5200 3.2141 1669464
3.1363 4.4192 5400 3.2227 1733528
3.4699 4.5829 5600 3.2105 1797608
2.9962 4.7466 5800 3.2237 1862328
3.6113 4.9104 6000 3.1934 1926824
4.0669 5.0737 6200 3.2136 1990752
3.7169 5.2374 6400 3.2272 2055200
3.803 5.4011 6600 3.2170 2119232
3.9642 5.5649 6800 3.2167 2183440
3.0489 5.7286 7000 3.2061 2247920
3.5582 5.8923 7200 3.2074 2312032
3.1043 6.0557 7400 3.1989 2376200
3.0691 6.2194 7600 3.2181 2440472
2.4881 6.3831 7800 3.2191 2504760
5.8401 6.5469 8000 3.1854 2568840
3.3306 6.7106 8200 3.1947 2632776
2.7831 6.8743 8400 3.2077 2697176
4.6388 7.0377 8600 3.2230 2761240
4.732 7.2014 8800 3.2151 2825240
4.6202 7.3651 9000 3.2434 2889368
3.0046 7.5289 9200 3.1974 2953752
3.1253 7.6926 9400 3.1951 3018440
2.4775 7.8563 9600 3.1978 3082552
1.7311 8.0196 9800 3.2191 3146472
3.4284 8.1834 10000 3.2100 3211320
2.1299 8.3471 10200 3.2070 3275192
2.3182 8.5108 10400 3.1918 3339400
3.3599 8.6746 10600 3.1857 3403656
3.7202 8.8383 10800 3.1957 3467848
4.6498 9.0016 11000 3.2008 3531952
5.2661 9.1654 11200 3.2382 3596368
3.6412 9.3291 11400 3.2045 3660496
1.9489 9.4928 11600 3.2084 3724480
4.1304 9.6566 11800 3.2145 3788928
1.9428 9.8203 12000 3.1894 3853296
2.7573 9.9840 12200 3.2168 3917232
2.7708 10.1474 12400 3.2046 3981568
2.951 10.3111 12600 3.2150 4045600
4.7755 10.4748 12800 3.2154 4110048
3.9557 10.6386 13000 3.1837 4174432
2.7547 10.8023 13200 3.2243 4238512
2.7812 10.9660 13400 3.2060 4302800
3.2587 11.1293 13600 3.1930 4366728
4.2143 11.2931 13800 3.1916 4431112
2.9836 11.4568 14000 3.2197 4495320
2.6835 11.6205 14200 3.2014 4559336
2.1858 11.7843 14400 3.2053 4623464
4.9649 11.9480 14600 3.1945 4687880
2.8991 12.1113 14800 3.1890 4752088
5.8153 12.2751 15000 3.2063 4816376
1.9165 12.4388 15200 3.2002 4881000
2.2727 12.6025 15400 3.2027 4944776
3.1679 12.7663 15600 3.2219 5009528
3.0102 12.9300 15800 3.2150 5073448
4.699 13.0933 16000 3.2290 5137696
2.5314 13.2571 16200 3.2048 5202256
4.6496 13.4208 16400 3.1821 5266128
4.0822 13.5845 16600 3.1788 5330256
4.3593 13.7483 16800 3.2099 5395072
3.9051 13.9120 17000 3.1960 5458672
3.9994 14.0753 17200 3.2103 5522480
2.8361 14.2391 17400 3.2233 5586480
4.9401 14.4028 17600 3.1868 5650208
3.8849 14.5665 17800 3.1857 5714704
3.5166 14.7302 18000 3.2083 5779488
3.9967 14.8940 18200 3.2256 5843728
1.8287 15.0573 18400 3.2158 5908152
2.9093 15.2210 18600 3.2090 5972168
2.1674 15.3848 18800 3.1765 6037144
2.9511 15.5485 19000 3.2208 6101800
4.2766 15.7122 19200 3.2050 6165416
4.3034 15.8760 19400 3.1860 6229672
4.6391 16.0393 19600 3.2082 6293504
1.9051 16.2030 19800 3.2015 6357840
2.2928 16.3668 20000 3.2217 6422352
1.9085 16.5305 20200 3.2065 6486352
3.924 16.6942 20400 3.2151 6550928
3.4126 16.8580 20600 3.2327 6615008
4.5493 17.0213 20800 3.2317 6678864
3.8615 17.1850 21000 3.2040 6743040
4.0426 17.3488 21200 3.2315 6807664
2.6831 17.5125 21400 3.2030 6871648
2.9024 17.6762 21600 3.2017 6936048
3.0366 17.8400 21800 3.1934 7000448
3.8192 18.0033 22000 3.1915 7064224
5.5216 18.1670 22200 3.2047 7128848
3.4714 18.3307 22400 3.1915 7192992
4.8674 18.4945 22600 3.1988 7256624
3.7189 18.6582 22800 3.1974 7321520
2.5776 18.8219 23000 3.2019 7385552
3.5356 18.9857 23200 3.2137 7449600
3.0237 19.1490 23400 3.2293 7513504
3.8673 19.3127 23600 3.1965 7577776
4.9547 19.4765 23800 3.2123 7642048
5.2959 19.6402 24000 3.1930 7706720
5.6198 19.8039 24200 3.2101 7770896
3.0233 19.9677 24400 3.2439 7835136
4.588 20.1310 24600 3.1871 7899176
2.266 20.2947 24800 3.2068 7963800
2.8501 20.4585 25000 3.2414 8028584
3.9682 20.6222 25200 3.2091 8092616
2.3008 20.7859 25400 3.2078 8157000
3.4068 20.9497 25600 3.2114 8220920
2.9892 21.1130 25800 3.2217 8284832
4.825 21.2767 26000 3.1974 8348832
2.818 21.4404 26200 3.2080 8412992
3.9167 21.6042 26400 3.2175 8476944
3.742 21.7679 26600 3.2088 8541536
5.8749 21.9316 26800 3.2172 8606128
1.9666 22.0950 27000 3.1960 8670264
3.3397 22.2587 27200 3.2020 8734456
3.8228 22.4224 27400 3.1910 8798776
3.4253 22.5862 27600 3.2239 8862888
4.1324 22.7499 27800 3.2102 8927464
2.1664 22.9136 28000 3.2067 8991912
3.3336 23.0770 28200 3.2092 9055920
3.3789 23.2407 28400 3.2119 9120064
5.0599 23.4044 28600 3.2146 9184496
3.0677 23.5682 28800 3.2242 9248672
3.5638 23.7319 29000 3.2197 9312880
3.5793 23.8956 29200 3.2128 9377264
3.3808 24.0589 29400 3.1855 9441584
2.9799 24.2227 29600 3.2139 9505936
1.491 24.3864 29800 3.2104 9570272
4.1475 24.5501 30000 3.1904 9634480
5.207 24.7139 30200 3.2083 9698784
3.496 24.8776 30400 3.2235 9762800
5.8899 25.0409 30600 3.1862 9826744
2.6238 25.2047 30800 3.2107 9890760
3.2139 25.3684 31000 3.2059 9955112
3.3207 25.5321 31200 3.1964 10019448
2.3831 25.6959 31400 3.2035 10083848
2.4526 25.8596 31600 3.1851 10147752
3.5566 26.0229 31800 3.2099 10211912
3.7733 26.1867 32000 3.2157 10275928
3.0118 26.3504 32200 3.2315 10340168
6.6009 26.5141 32400 3.2197 10404376
2.9853 26.6779 32600 3.2137 10469048
2.4959 26.8416 32800 3.2112 10533640
2.538 27.0049 33000 3.1927 10597888
2.8235 27.1686 33200 3.2044 10662240
3.1748 27.3324 33400 3.2107 10726640
2.7298 27.4961 33600 3.2253 10790608
5.0767 27.6598 33800 3.2285 10854688
1.4213 27.8236 34000 3.2188 10919360
2.7594 27.9873 34200 3.2144 10983664
3.118 28.1506 34400 3.2237 11047464
3.26 28.3144 34600 3.1957 11111848
2.3447 28.4781 34800 3.2221 11176376
3.8171 28.6418 35000 3.2131 11241256
4.0589 28.8056 35200 3.1951 11305112
3.7132 28.9693 35400 3.2159 11369464
5.1964 29.1326 35600 3.2144 11433608
3.0752 29.2964 35800 3.2138 11497944
3.2964 29.4601 36000 3.2134 11562200
4.1533 29.6238 36200 3.2109 11626152
2.9437 29.7876 36400 3.2126 11690824
1.688 29.9513 36600 3.2131 11755016
2.9703 30.1146 36800 3.2125 11818880
4.8531 30.2783 37000 3.2125 11882768
3.5959 30.4421 37200 3.2125 11946912
3.9701 30.6058 37400 3.2125 12011696
2.3307 30.7695 37600 3.2125 12075664
7.4112 30.9333 37800 3.2125 12139680
3.0752 31.0966 38000 3.2125 12204000
3.0064 31.2603 38200 3.2125 12268800
4.9243 31.4241 38400 3.2125 12333024
4.4955 31.5878 38600 3.2125 12396976
2.4931 31.7515 38800 3.2125 12461104
3.6867 31.9153 39000 3.2125 12524768
1.5921 32.0786 39200 3.2125 12588496
4.4451 32.2423 39400 3.2125 12653136
3.3934 32.4061 39600 3.2125 12717328
2.3516 32.5698 39800 3.2125 12781536
3.6117 32.7335 40000 3.2125 12845616

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950295

Adapter
(543)
this model

Evaluation results