train_wic_1745950284
This model is a fine-tuned version of google/gemma-3-1b-it on the wic dataset. It achieves the following results on the evaluation set:
- Loss: 0.2124
- Num Input Tokens Seen: 13031928
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.3
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.2715 | 0.1637 | 200 | 0.3164 | 65024 |
| 0.2274 | 0.3275 | 400 | 0.2345 | 129984 |
| 0.2311 | 0.4912 | 600 | 0.2379 | 195024 |
| 0.2416 | 0.6549 | 800 | 0.2349 | 260624 |
| 0.2727 | 0.8187 | 1000 | 0.2304 | 325984 |
| 0.2128 | 0.9824 | 1200 | 0.2317 | 391280 |
| 0.2307 | 1.1457 | 1400 | 0.2352 | 456248 |
| 0.2337 | 1.3095 | 1600 | 0.2426 | 521464 |
| 0.2442 | 1.4732 | 1800 | 0.2327 | 586632 |
| 0.2685 | 1.6369 | 2000 | 0.2358 | 651384 |
| 0.2418 | 1.8007 | 2200 | 0.2493 | 716552 |
| 0.2271 | 1.9644 | 2400 | 0.2425 | 781992 |
| 0.2924 | 2.1277 | 2600 | 0.2365 | 847136 |
| 0.2458 | 2.2914 | 2800 | 0.3350 | 912064 |
| 0.2465 | 2.4552 | 3000 | 0.2398 | 977312 |
| 0.2235 | 2.6189 | 3200 | 0.2412 | 1042608 |
| 0.2477 | 2.7826 | 3400 | 0.2297 | 1107488 |
| 0.2354 | 2.9464 | 3600 | 0.2284 | 1172864 |
| 0.2357 | 3.1097 | 3800 | 0.2291 | 1238392 |
| 0.2293 | 3.2734 | 4000 | 0.2512 | 1303640 |
| 0.2271 | 3.4372 | 4200 | 0.2333 | 1368504 |
| 0.222 | 3.6009 | 4400 | 0.2289 | 1433480 |
| 0.2304 | 3.7646 | 4600 | 0.2273 | 1499016 |
| 0.2258 | 3.9284 | 4800 | 0.2271 | 1563880 |
| 0.2271 | 4.0917 | 5000 | 0.2319 | 1628808 |
| 0.3823 | 4.2554 | 5200 | 0.2361 | 1693576 |
| 0.2938 | 4.4192 | 5400 | 0.2366 | 1758536 |
| 0.2485 | 4.5829 | 5600 | 0.2260 | 1823544 |
| 0.2194 | 4.7466 | 5800 | 0.2265 | 1889272 |
| 0.2384 | 4.9104 | 6000 | 0.2394 | 1954632 |
| 0.2244 | 5.0737 | 6200 | 0.2315 | 2019440 |
| 0.2615 | 5.2374 | 6400 | 0.2364 | 2084816 |
| 0.2197 | 5.4011 | 6600 | 0.2276 | 2149632 |
| 0.2396 | 5.5649 | 6800 | 0.2672 | 2214864 |
| 0.2241 | 5.7286 | 7000 | 0.2358 | 2280368 |
| 0.2488 | 5.8923 | 7200 | 0.2257 | 2345632 |
| 0.255 | 6.0557 | 7400 | 0.2292 | 2410768 |
| 0.2135 | 6.2194 | 7600 | 0.2368 | 2476096 |
| 0.2133 | 6.3831 | 7800 | 0.2191 | 2541152 |
| 0.2346 | 6.5469 | 8000 | 0.2220 | 2606016 |
| 0.2732 | 6.7106 | 8200 | 0.2403 | 2670896 |
| 0.1987 | 6.8743 | 8400 | 0.2296 | 2736160 |
| 0.2541 | 7.0377 | 8600 | 0.2570 | 2801120 |
| 0.2813 | 7.2014 | 8800 | 0.2425 | 2865872 |
| 0.2214 | 7.3651 | 9000 | 0.2266 | 2931072 |
| 0.2226 | 7.5289 | 9200 | 0.2235 | 2996288 |
| 0.2263 | 7.6926 | 9400 | 0.2298 | 3061744 |
| 0.2219 | 7.8563 | 9600 | 0.2340 | 3126896 |
| 0.2258 | 8.0196 | 9800 | 0.2238 | 3191832 |
| 0.2164 | 8.1834 | 10000 | 0.2256 | 3257640 |
| 0.2183 | 8.3471 | 10200 | 0.2265 | 3322584 |
| 0.204 | 8.5108 | 10400 | 0.2171 | 3387672 |
| 0.2382 | 8.6746 | 10600 | 0.2342 | 3452968 |
| 0.223 | 8.8383 | 10800 | 0.2154 | 3518104 |
| 0.1963 | 9.0016 | 11000 | 0.2423 | 3583216 |
| 0.1835 | 9.1654 | 11200 | 0.2185 | 3648592 |
| 0.2479 | 9.3291 | 11400 | 0.2296 | 3713808 |
| 0.2147 | 9.4928 | 11600 | 0.2213 | 3778848 |
| 0.2563 | 9.6566 | 11800 | 0.2289 | 3844208 |
| 0.202 | 9.8203 | 12000 | 0.2318 | 3909264 |
| 0.2283 | 9.9840 | 12200 | 0.2170 | 3974224 |
| 0.2217 | 10.1474 | 12400 | 0.2161 | 4039488 |
| 0.2395 | 10.3111 | 12600 | 0.2407 | 4104512 |
| 0.2253 | 10.4748 | 12800 | 0.2227 | 4169856 |
| 0.2292 | 10.6386 | 13000 | 0.2186 | 4234864 |
| 0.2041 | 10.8023 | 13200 | 0.2170 | 4300144 |
| 0.2481 | 10.9660 | 13400 | 0.2197 | 4365440 |
| 0.2341 | 11.1293 | 13600 | 0.2186 | 4430440 |
| 0.2015 | 11.2931 | 13800 | 0.2200 | 4495784 |
| 0.2158 | 11.4568 | 14000 | 0.2682 | 4560792 |
| 0.2137 | 11.6205 | 14200 | 0.2235 | 4625720 |
| 0.2315 | 11.7843 | 14400 | 0.2165 | 4690744 |
| 0.2227 | 11.9480 | 14600 | 0.2212 | 4756152 |
| 0.1909 | 12.1113 | 14800 | 0.2239 | 4821256 |
| 0.2336 | 12.2751 | 15000 | 0.2271 | 4886344 |
| 0.2112 | 12.4388 | 15200 | 0.2210 | 4951960 |
| 0.2055 | 12.6025 | 15400 | 0.2163 | 5016856 |
| 0.1951 | 12.7663 | 15600 | 0.2346 | 5082248 |
| 0.1957 | 12.9300 | 15800 | 0.2182 | 5147240 |
| 0.1623 | 13.0933 | 16000 | 0.2141 | 5212440 |
| 0.1976 | 13.2571 | 16200 | 0.2190 | 5277800 |
| 0.1638 | 13.4208 | 16400 | 0.2137 | 5342760 |
| 0.1778 | 13.5845 | 16600 | 0.2159 | 5407816 |
| 0.168 | 13.7483 | 16800 | 0.2267 | 5473672 |
| 0.1583 | 13.9120 | 17000 | 0.2180 | 5538456 |
| 0.202 | 14.0753 | 17200 | 0.2194 | 5603152 |
| 0.1838 | 14.2391 | 17400 | 0.2204 | 5668048 |
| 0.236 | 14.4028 | 17600 | 0.2230 | 5732816 |
| 0.1813 | 14.5665 | 17800 | 0.2134 | 5798240 |
| 0.1848 | 14.7302 | 18000 | 0.2251 | 5863936 |
| 0.181 | 14.8940 | 18200 | 0.2124 | 5929216 |
| 0.1917 | 15.0573 | 18400 | 0.2249 | 5994376 |
| 0.1809 | 15.2210 | 18600 | 0.2176 | 6059464 |
| 0.1457 | 15.3848 | 18800 | 0.2312 | 6125240 |
| 0.1708 | 15.5485 | 19000 | 0.2170 | 6190600 |
| 0.2537 | 15.7122 | 19200 | 0.2186 | 6255240 |
| 0.2215 | 15.8760 | 19400 | 0.2192 | 6320328 |
| 0.1611 | 16.0393 | 19600 | 0.2299 | 6385240 |
| 0.2651 | 16.2030 | 19800 | 0.2264 | 6450424 |
| 0.1397 | 16.3668 | 20000 | 0.2193 | 6515688 |
| 0.2026 | 16.5305 | 20200 | 0.2168 | 6580712 |
| 0.1425 | 16.6942 | 20400 | 0.2206 | 6646184 |
| 0.2571 | 16.8580 | 20600 | 0.2225 | 6711480 |
| 0.1511 | 17.0213 | 20800 | 0.2285 | 6776176 |
| 0.2368 | 17.1850 | 21000 | 0.2505 | 6841120 |
| 0.1908 | 17.3488 | 21200 | 0.2320 | 6906528 |
| 0.2122 | 17.5125 | 21400 | 0.2256 | 6971568 |
| 0.1316 | 17.6762 | 21600 | 0.2224 | 7036832 |
| 0.1514 | 17.8400 | 21800 | 0.2220 | 7102176 |
| 0.1344 | 18.0033 | 22000 | 0.2611 | 7167168 |
| 0.2122 | 18.1670 | 22200 | 0.2518 | 7232736 |
| 0.1607 | 18.3307 | 22400 | 0.2545 | 7297984 |
| 0.169 | 18.4945 | 22600 | 0.2347 | 7362832 |
| 0.2237 | 18.6582 | 22800 | 0.2710 | 7428672 |
| 0.1654 | 18.8219 | 23000 | 0.2307 | 7493504 |
| 0.1552 | 18.9857 | 23200 | 0.2472 | 7558400 |
| 0.1279 | 19.1490 | 23400 | 0.2919 | 7623392 |
| 0.0885 | 19.3127 | 23600 | 0.2574 | 7688624 |
| 0.1143 | 19.4765 | 23800 | 0.2849 | 7753632 |
| 0.0825 | 19.6402 | 24000 | 0.2669 | 7819136 |
| 0.19 | 19.8039 | 24200 | 0.2522 | 7884272 |
| 0.0499 | 19.9677 | 24400 | 0.2400 | 7949504 |
| 0.0234 | 20.1310 | 24600 | 0.3343 | 8014544 |
| 0.0928 | 20.2947 | 24800 | 0.3014 | 8079920 |
| 0.1606 | 20.4585 | 25000 | 0.2954 | 8145552 |
| 0.0607 | 20.6222 | 25200 | 0.2935 | 8210688 |
| 0.0667 | 20.7859 | 25400 | 0.3004 | 8275760 |
| 0.081 | 20.9497 | 25600 | 0.2915 | 8340784 |
| 0.0685 | 21.1130 | 25800 | 0.3550 | 8405688 |
| 0.1446 | 21.2767 | 26000 | 0.3759 | 8470664 |
| 0.0373 | 21.4404 | 26200 | 0.3593 | 8535736 |
| 0.0769 | 21.6042 | 26400 | 0.3506 | 8600728 |
| 0.0576 | 21.7679 | 26600 | 0.3605 | 8666296 |
| 0.1533 | 21.9316 | 26800 | 0.3496 | 8731640 |
| 0.0531 | 22.0950 | 27000 | 0.3704 | 8796704 |
| 0.0571 | 22.2587 | 27200 | 0.4090 | 8861792 |
| 0.025 | 22.4224 | 27400 | 0.4031 | 8927168 |
| 0.0626 | 22.5862 | 27600 | 0.3957 | 8992240 |
| 0.0602 | 22.7499 | 27800 | 0.4214 | 9057600 |
| 0.0231 | 22.9136 | 28000 | 0.4342 | 9122992 |
| 0.0269 | 23.0770 | 28200 | 0.4263 | 9187992 |
| 0.0157 | 23.2407 | 28400 | 0.4508 | 9253112 |
| 0.0063 | 23.4044 | 28600 | 0.4433 | 9318440 |
| 0.0117 | 23.5682 | 28800 | 0.4480 | 9383656 |
| 0.0127 | 23.7319 | 29000 | 0.4403 | 9448616 |
| 0.0236 | 23.8956 | 29200 | 0.4501 | 9513976 |
| 0.0385 | 24.0589 | 29400 | 0.4397 | 9579416 |
| 0.0074 | 24.2227 | 29600 | 0.4718 | 9644664 |
| 0.0094 | 24.3864 | 29800 | 0.4893 | 9710056 |
| 0.0044 | 24.5501 | 30000 | 0.4844 | 9775272 |
| 0.0156 | 24.7139 | 30200 | 0.4942 | 9840600 |
| 0.0132 | 24.8776 | 30400 | 0.4873 | 9905368 |
| 0.0118 | 25.0409 | 30600 | 0.4813 | 9970160 |
| 0.0038 | 25.2047 | 30800 | 0.4980 | 10035200 |
| 0.0106 | 25.3684 | 31000 | 0.4941 | 10100368 |
| 0.0032 | 25.5321 | 31200 | 0.5118 | 10165552 |
| 0.0098 | 25.6959 | 31400 | 0.4987 | 10230992 |
| 0.0105 | 25.8596 | 31600 | 0.5105 | 10295840 |
| 0.0022 | 26.0229 | 31800 | 0.5109 | 10360952 |
| 0.002 | 26.1867 | 32000 | 0.5198 | 10425832 |
| 0.0073 | 26.3504 | 32200 | 0.5175 | 10490904 |
| 0.0018 | 26.5141 | 32400 | 0.5253 | 10556056 |
| 0.0019 | 26.6779 | 32600 | 0.5312 | 10621432 |
| 0.0306 | 26.8416 | 32800 | 0.5417 | 10686808 |
| 0.0035 | 27.0049 | 33000 | 0.5516 | 10751912 |
| 0.0006 | 27.1686 | 33200 | 0.5625 | 10817272 |
| 0.0018 | 27.3324 | 33400 | 0.5561 | 10882568 |
| 0.0024 | 27.4961 | 33600 | 0.5652 | 10947368 |
| 0.0034 | 27.6598 | 33800 | 0.5646 | 11012568 |
| 0.0016 | 27.8236 | 34000 | 0.5783 | 11078056 |
| 0.0015 | 27.9873 | 34200 | 0.5654 | 11143272 |
| 0.001 | 28.1506 | 34400 | 0.5675 | 11208128 |
| 0.0015 | 28.3144 | 34600 | 0.5702 | 11273344 |
| 0.0009 | 28.4781 | 34800 | 0.5859 | 11338704 |
| 0.0011 | 28.6418 | 35000 | 0.5847 | 11404240 |
| 0.001 | 28.8056 | 35200 | 0.5955 | 11469056 |
| 0.0009 | 28.9693 | 35400 | 0.5978 | 11534288 |
| 0.0007 | 29.1326 | 35600 | 0.5965 | 11599248 |
| 0.0011 | 29.2964 | 35800 | 0.6026 | 11664528 |
| 0.0006 | 29.4601 | 36000 | 0.6074 | 11729904 |
| 0.0015 | 29.6238 | 36200 | 0.6088 | 11794928 |
| 0.0005 | 29.7876 | 36400 | 0.6128 | 11860400 |
| 0.0025 | 29.9513 | 36600 | 0.6181 | 11925328 |
| 0.0006 | 30.1146 | 36800 | 0.6168 | 11989944 |
| 0.0011 | 30.2783 | 37000 | 0.6182 | 12054968 |
| 0.0013 | 30.4421 | 37200 | 0.6234 | 12120184 |
| 0.0017 | 30.6058 | 37400 | 0.6262 | 12185832 |
| 0.0008 | 30.7695 | 37600 | 0.6266 | 12250664 |
| 0.0014 | 30.9333 | 37800 | 0.6292 | 12315704 |
| 0.001 | 31.0966 | 38000 | 0.6328 | 12380824 |
| 0.0007 | 31.2603 | 38200 | 0.6355 | 12446424 |
| 0.0005 | 31.4241 | 38400 | 0.6317 | 12511800 |
| 0.0005 | 31.5878 | 38600 | 0.6359 | 12576920 |
| 0.0005 | 31.7515 | 38800 | 0.6349 | 12641896 |
| 0.0007 | 31.9153 | 39000 | 0.6342 | 12706504 |
| 0.0002 | 32.0786 | 39200 | 0.6396 | 12771208 |
| 0.0009 | 32.2423 | 39400 | 0.6357 | 12836760 |
| 0.0008 | 32.4061 | 39600 | 0.6356 | 12901944 |
| 0.0005 | 32.5698 | 39800 | 0.6323 | 12967000 |
| 0.0005 | 32.7335 | 40000 | 0.6317 | 13031928 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support