train_wic_1745950288
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:
- Loss: 0.2431
- Num Input Tokens Seen: 12716696
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.41 | 0.1637 | 200 | 0.3478 | 63344 |
| 0.297 | 0.3275 | 400 | 0.3203 | 126720 |
| 0.3247 | 0.4912 | 600 | 0.3113 | 190304 |
| 0.3098 | 0.6549 | 800 | 0.3043 | 254384 |
| 0.2768 | 0.8187 | 1000 | 0.3050 | 318128 |
| 0.3171 | 0.9824 | 1200 | 0.2925 | 381920 |
| 0.2851 | 1.1457 | 1400 | 0.2898 | 445096 |
| 0.3462 | 1.3095 | 1600 | 0.2833 | 508744 |
| 0.2697 | 1.4732 | 1800 | 0.2807 | 572408 |
| 0.3136 | 1.6369 | 2000 | 0.2809 | 635736 |
| 0.2403 | 1.8007 | 2200 | 0.2779 | 699464 |
| 0.1928 | 1.9644 | 2400 | 0.2772 | 763192 |
| 0.3162 | 2.1277 | 2600 | 0.2764 | 826784 |
| 0.2806 | 2.2914 | 2800 | 0.2734 | 890336 |
| 0.2619 | 2.4552 | 3000 | 0.2706 | 953840 |
| 0.2728 | 2.6189 | 3200 | 0.2739 | 1017600 |
| 0.3463 | 2.7826 | 3400 | 0.2682 | 1081104 |
| 0.2784 | 2.9464 | 3600 | 0.2725 | 1144576 |
| 0.3344 | 3.1097 | 3800 | 0.2707 | 1208440 |
| 0.2909 | 3.2734 | 4000 | 0.2657 | 1272216 |
| 0.1931 | 3.4372 | 4200 | 0.2641 | 1335496 |
| 0.1951 | 3.6009 | 4400 | 0.2710 | 1398984 |
| 0.2575 | 3.7646 | 4600 | 0.2608 | 1462856 |
| 0.3759 | 3.9284 | 4800 | 0.2611 | 1526280 |
| 0.1822 | 4.0917 | 5000 | 0.2609 | 1589584 |
| 0.1742 | 4.2554 | 5200 | 0.2589 | 1653024 |
| 0.2095 | 4.4192 | 5400 | 0.2587 | 1716432 |
| 0.2358 | 4.5829 | 5600 | 0.2577 | 1779984 |
| 0.1787 | 4.7466 | 5800 | 0.2573 | 1843936 |
| 0.3909 | 4.9104 | 6000 | 0.2558 | 1907808 |
| 0.1614 | 5.0737 | 6200 | 0.2538 | 1971048 |
| 0.2256 | 5.2374 | 6400 | 0.2572 | 2034808 |
| 0.2986 | 5.4011 | 6600 | 0.2548 | 2098088 |
| 0.2891 | 5.5649 | 6800 | 0.2574 | 2161640 |
| 0.2935 | 5.7286 | 7000 | 0.2562 | 2225432 |
| 0.3234 | 5.8923 | 7200 | 0.2562 | 2289032 |
| 0.3431 | 6.0557 | 7400 | 0.2542 | 2352656 |
| 0.3034 | 6.2194 | 7600 | 0.2614 | 2416160 |
| 0.149 | 6.3831 | 7800 | 0.2499 | 2479728 |
| 0.3029 | 6.5469 | 8000 | 0.2487 | 2543168 |
| 0.3466 | 6.7106 | 8200 | 0.2522 | 2606560 |
| 0.2033 | 6.8743 | 8400 | 0.2534 | 2670208 |
| 0.2473 | 7.0377 | 8600 | 0.2495 | 2733584 |
| 0.2264 | 7.2014 | 8800 | 0.2527 | 2797008 |
| 0.3126 | 7.3651 | 9000 | 0.2499 | 2860576 |
| 0.202 | 7.5289 | 9200 | 0.2509 | 2924256 |
| 0.1119 | 7.6926 | 9400 | 0.2521 | 2988272 |
| 0.2043 | 7.8563 | 9600 | 0.2489 | 3051776 |
| 0.2157 | 8.0196 | 9800 | 0.2483 | 3114992 |
| 0.3124 | 8.1834 | 10000 | 0.2466 | 3179200 |
| 0.2138 | 8.3471 | 10200 | 0.2481 | 3242496 |
| 0.2217 | 8.5108 | 10400 | 0.2474 | 3306112 |
| 0.3002 | 8.6746 | 10600 | 0.2437 | 3369760 |
| 0.2043 | 8.8383 | 10800 | 0.2509 | 3433360 |
| 0.0986 | 9.0016 | 11000 | 0.2515 | 3496680 |
| 0.186 | 9.1654 | 11200 | 0.2492 | 3560648 |
| 0.2636 | 9.3291 | 11400 | 0.2487 | 3624200 |
| 0.2705 | 9.4928 | 11600 | 0.2471 | 3687560 |
| 0.3363 | 9.6566 | 11800 | 0.2441 | 3751288 |
| 0.1675 | 9.8203 | 12000 | 0.2432 | 3814952 |
| 0.1993 | 9.9840 | 12200 | 0.2458 | 3878120 |
| 0.1998 | 10.1474 | 12400 | 0.2502 | 3941616 |
| 0.2337 | 10.3111 | 12600 | 0.2440 | 4005216 |
| 0.3763 | 10.4748 | 12800 | 0.2453 | 4068912 |
| 0.3058 | 10.6386 | 13000 | 0.2535 | 4132608 |
| 0.2823 | 10.8023 | 13200 | 0.2487 | 4196096 |
| 0.2078 | 10.9660 | 13400 | 0.2456 | 4259680 |
| 0.1691 | 11.1293 | 13600 | 0.2438 | 4323128 |
| 0.2832 | 11.2931 | 13800 | 0.2451 | 4386856 |
| 0.1692 | 11.4568 | 14000 | 0.2431 | 4450296 |
| 0.3105 | 11.6205 | 14200 | 0.2437 | 4513544 |
| 0.2107 | 11.7843 | 14400 | 0.2434 | 4576984 |
| 0.5025 | 11.9480 | 14600 | 0.2483 | 4640904 |
| 0.2113 | 12.1113 | 14800 | 0.2456 | 4704360 |
| 0.3132 | 12.2751 | 15000 | 0.2507 | 4768152 |
| 0.1774 | 12.4388 | 15200 | 0.2456 | 4832152 |
| 0.1488 | 12.6025 | 15400 | 0.2438 | 4895192 |
| 0.1861 | 12.7663 | 15600 | 0.2448 | 4959112 |
| 0.158 | 12.9300 | 15800 | 0.2496 | 5022408 |
| 0.4641 | 13.0933 | 16000 | 0.2483 | 5086016 |
| 0.4055 | 13.2571 | 16200 | 0.2483 | 5149920 |
| 0.2735 | 13.4208 | 16400 | 0.2446 | 5213296 |
| 0.2592 | 13.5845 | 16600 | 0.2448 | 5276672 |
| 0.3108 | 13.7483 | 16800 | 0.2472 | 5340624 |
| 0.1532 | 13.9120 | 17000 | 0.2479 | 5403792 |
| 0.442 | 14.0753 | 17200 | 0.2476 | 5466936 |
| 0.3657 | 14.2391 | 17400 | 0.2491 | 5530392 |
| 0.2201 | 14.4028 | 17600 | 0.2469 | 5593576 |
| 0.1593 | 14.5665 | 17800 | 0.2547 | 5657288 |
| 0.3432 | 14.7302 | 18000 | 0.2517 | 5721496 |
| 0.2167 | 14.8940 | 18200 | 0.2472 | 5785096 |
| 0.1937 | 15.0573 | 18400 | 0.2484 | 5848736 |
| 0.1149 | 15.2210 | 18600 | 0.2456 | 5912176 |
| 0.2339 | 15.3848 | 18800 | 0.2516 | 5976400 |
| 0.2008 | 15.5485 | 19000 | 0.2508 | 6040272 |
| 0.2109 | 15.7122 | 19200 | 0.2501 | 6103424 |
| 0.3115 | 15.8760 | 19400 | 0.2532 | 6166912 |
| 0.1857 | 16.0393 | 19600 | 0.2505 | 6230320 |
| 0.2243 | 16.2030 | 19800 | 0.2501 | 6294224 |
| 0.2037 | 16.3668 | 20000 | 0.2495 | 6357984 |
| 0.2036 | 16.5305 | 20200 | 0.2553 | 6421344 |
| 0.1978 | 16.6942 | 20400 | 0.2543 | 6485152 |
| 0.1985 | 16.8580 | 20600 | 0.2505 | 6548768 |
| 0.3801 | 17.0213 | 20800 | 0.2489 | 6611792 |
| 0.0677 | 17.1850 | 21000 | 0.2487 | 6675216 |
| 0.1926 | 17.3488 | 21200 | 0.2559 | 6739088 |
| 0.3585 | 17.5125 | 21400 | 0.2489 | 6802352 |
| 0.1407 | 17.6762 | 21600 | 0.2480 | 6866160 |
| 0.2853 | 17.8400 | 21800 | 0.2511 | 6929936 |
| 0.3343 | 18.0033 | 22000 | 0.2501 | 6993168 |
| 0.2399 | 18.1670 | 22200 | 0.2508 | 7057008 |
| 0.1996 | 18.3307 | 22400 | 0.2518 | 7120624 |
| 0.2152 | 18.4945 | 22600 | 0.2520 | 7183872 |
| 0.2337 | 18.6582 | 22800 | 0.2488 | 7247952 |
| 0.1151 | 18.8219 | 23000 | 0.2596 | 7311488 |
| 0.29 | 18.9857 | 23200 | 0.2509 | 7374848 |
| 0.2492 | 19.1490 | 23400 | 0.2526 | 7438160 |
| 0.2518 | 19.3127 | 23600 | 0.2554 | 7501872 |
| 0.4147 | 19.4765 | 23800 | 0.2574 | 7565520 |
| 0.1942 | 19.6402 | 24000 | 0.2513 | 7629488 |
| 0.2559 | 19.8039 | 24200 | 0.2520 | 7692992 |
| 0.1484 | 19.9677 | 24400 | 0.2513 | 7756512 |
| 0.1742 | 20.1310 | 24600 | 0.2520 | 7819816 |
| 0.2045 | 20.2947 | 24800 | 0.2538 | 7883800 |
| 0.1875 | 20.4585 | 25000 | 0.2575 | 7947944 |
| 0.1281 | 20.6222 | 25200 | 0.2584 | 8011336 |
| 0.2972 | 20.7859 | 25400 | 0.2562 | 8075000 |
| 0.0821 | 20.9497 | 25600 | 0.2553 | 8138568 |
| 0.1122 | 21.1130 | 25800 | 0.2609 | 8201872 |
| 0.2026 | 21.2767 | 26000 | 0.2557 | 8265168 |
| 0.1659 | 21.4404 | 26200 | 0.2576 | 8328704 |
| 0.238 | 21.6042 | 26400 | 0.2556 | 8392144 |
| 0.3934 | 21.7679 | 26600 | 0.2601 | 8456096 |
| 0.2723 | 21.9316 | 26800 | 0.2551 | 8519872 |
| 0.1656 | 22.0950 | 27000 | 0.2595 | 8583464 |
| 0.2091 | 22.2587 | 27200 | 0.2611 | 8646840 |
| 0.2229 | 22.4224 | 27400 | 0.2619 | 8710600 |
| 0.167 | 22.5862 | 27600 | 0.2599 | 8774344 |
| 0.2446 | 22.7499 | 27800 | 0.2590 | 8838024 |
| 0.3715 | 22.9136 | 28000 | 0.2589 | 8901832 |
| 0.1431 | 23.0770 | 28200 | 0.2608 | 8965184 |
| 0.1222 | 23.2407 | 28400 | 0.2616 | 9028576 |
| 0.2605 | 23.4044 | 28600 | 0.2582 | 9092256 |
| 0.1257 | 23.5682 | 28800 | 0.2569 | 9155872 |
| 0.189 | 23.7319 | 29000 | 0.2581 | 9219312 |
| 0.1947 | 23.8956 | 29200 | 0.2590 | 9283264 |
| 0.1844 | 24.0589 | 29400 | 0.2600 | 9346992 |
| 0.2484 | 24.2227 | 29600 | 0.2620 | 9410880 |
| 0.2888 | 24.3864 | 29800 | 0.2580 | 9474704 |
| 0.2298 | 24.5501 | 30000 | 0.2592 | 9538160 |
| 0.2833 | 24.7139 | 30200 | 0.2593 | 9601792 |
| 0.2394 | 24.8776 | 30400 | 0.2608 | 9664976 |
| 0.1825 | 25.0409 | 30600 | 0.2639 | 9728232 |
| 0.1197 | 25.2047 | 30800 | 0.2623 | 9791848 |
| 0.0702 | 25.3684 | 31000 | 0.2609 | 9855400 |
| 0.1219 | 25.5321 | 31200 | 0.2620 | 9918984 |
| 0.0407 | 25.6959 | 31400 | 0.2644 | 9982872 |
| 0.1427 | 25.8596 | 31600 | 0.2624 | 10046056 |
| 0.0861 | 26.0229 | 31800 | 0.2630 | 10109568 |
| 0.1017 | 26.1867 | 32000 | 0.2604 | 10173072 |
| 0.1502 | 26.3504 | 32200 | 0.2605 | 10236512 |
| 0.3748 | 26.5141 | 32400 | 0.2609 | 10299920 |
| 0.1164 | 26.6779 | 32600 | 0.2619 | 10363808 |
| 0.3463 | 26.8416 | 32800 | 0.2628 | 10427744 |
| 0.1913 | 27.0049 | 33000 | 0.2642 | 10491384 |
| 0.2181 | 27.1686 | 33200 | 0.2640 | 10555192 |
| 0.2107 | 27.3324 | 33400 | 0.2654 | 10619080 |
| 0.2662 | 27.4961 | 33600 | 0.2622 | 10682424 |
| 0.2848 | 27.6598 | 33800 | 0.2604 | 10746024 |
| 0.0842 | 27.8236 | 34000 | 0.2624 | 10809736 |
| 0.4161 | 27.9873 | 34200 | 0.2619 | 10873448 |
| 0.1133 | 28.1506 | 34400 | 0.2627 | 10936704 |
| 0.1194 | 28.3144 | 34600 | 0.2616 | 11000112 |
| 0.2269 | 28.4781 | 34800 | 0.2609 | 11063936 |
| 0.0971 | 28.6418 | 35000 | 0.2651 | 11128160 |
| 0.1533 | 28.8056 | 35200 | 0.2629 | 11191600 |
| 0.1651 | 28.9693 | 35400 | 0.2622 | 11255184 |
| 0.0591 | 29.1326 | 35600 | 0.2627 | 11318640 |
| 0.2183 | 29.2964 | 35800 | 0.2638 | 11382352 |
| 0.2147 | 29.4601 | 36000 | 0.2654 | 11446048 |
| 0.0753 | 29.6238 | 36200 | 0.2648 | 11509328 |
| 0.0322 | 29.7876 | 36400 | 0.2641 | 11573312 |
| 0.1039 | 29.9513 | 36600 | 0.2624 | 11636752 |
| 0.2158 | 30.1146 | 36800 | 0.2621 | 11700056 |
| 0.2059 | 30.2783 | 37000 | 0.2637 | 11763352 |
| 0.1896 | 30.4421 | 37200 | 0.2632 | 11826952 |
| 0.2378 | 30.6058 | 37400 | 0.2641 | 11890888 |
| 0.2648 | 30.7695 | 37600 | 0.2634 | 11954296 |
| 0.3572 | 30.9333 | 37800 | 0.2607 | 12017784 |
| 0.3041 | 31.0966 | 38000 | 0.2649 | 12081304 |
| 0.1618 | 31.2603 | 38200 | 0.2624 | 12145240 |
| 0.2205 | 31.4241 | 38400 | 0.2644 | 12208888 |
| 0.2066 | 31.5878 | 38600 | 0.2651 | 12272344 |
| 0.265 | 31.7515 | 38800 | 0.2623 | 12335960 |
| 0.3534 | 31.9153 | 39000 | 0.2628 | 12399064 |
| 0.1435 | 32.0786 | 39200 | 0.2638 | 12462200 |
| 0.2838 | 32.2423 | 39400 | 0.2652 | 12526024 |
| 0.1894 | 32.4061 | 39600 | 0.2652 | 12589496 |
| 0.175 | 32.5698 | 39800 | 0.2652 | 12653080 |
| 0.1656 | 32.7335 | 40000 | 0.2652 | 12716696 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_wic_1745950288
Base model
meta-llama/Meta-Llama-3-8B-Instruct