train_wic_1745950293
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the wic dataset. It achieves the following results on the evaluation set:
- Loss: 0.3417
- Num Input Tokens Seen: 12845616
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.3
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.3857 | 0.1637 | 200 | 0.4081 | 64080 |
| 0.3452 | 0.3275 | 400 | 0.3471 | 128048 |
| 0.3496 | 0.4912 | 600 | 0.3524 | 192224 |
| 0.3446 | 0.6549 | 800 | 0.3534 | 256832 |
| 0.4153 | 0.8187 | 1000 | 0.3626 | 321264 |
| 0.3167 | 0.9824 | 1200 | 0.3501 | 385728 |
| 0.3378 | 1.1457 | 1400 | 0.3641 | 449768 |
| 0.3471 | 1.3095 | 1600 | 0.3551 | 514072 |
| 0.3574 | 1.4732 | 1800 | 0.3458 | 578408 |
| 0.3519 | 1.6369 | 2000 | 0.3447 | 642248 |
| 0.3772 | 1.8007 | 2200 | 0.3861 | 706488 |
| 0.3803 | 1.9644 | 2400 | 0.3458 | 770888 |
| 0.4218 | 2.1277 | 2600 | 0.3629 | 835216 |
| 0.3449 | 2.2914 | 2800 | 0.3458 | 899312 |
| 0.5685 | 2.4552 | 3000 | 0.3504 | 963696 |
| 0.3183 | 2.6189 | 3200 | 0.3582 | 1027904 |
| 0.3696 | 2.7826 | 3400 | 0.3432 | 1092016 |
| 0.3519 | 2.9464 | 3600 | 0.3418 | 1156240 |
| 0.3502 | 3.1097 | 3800 | 0.3447 | 1220568 |
| 0.3436 | 3.2734 | 4000 | 0.3677 | 1285128 |
| 0.532 | 3.4372 | 4200 | 0.3532 | 1349032 |
| 0.3328 | 3.6009 | 4400 | 0.3417 | 1413096 |
| 0.3526 | 3.7646 | 4600 | 0.3477 | 1477816 |
| 0.3341 | 3.9284 | 4800 | 0.3422 | 1541800 |
| 0.3339 | 4.0917 | 5000 | 0.3496 | 1605480 |
| 0.5998 | 4.2554 | 5200 | 0.3777 | 1669464 |
| 0.4398 | 4.4192 | 5400 | 0.3473 | 1733528 |
| 0.3754 | 4.5829 | 5600 | 0.3421 | 1797608 |
| 0.3413 | 4.7466 | 5800 | 0.3469 | 1862328 |
| 0.3449 | 4.9104 | 6000 | 0.3605 | 1926824 |
| 0.3519 | 5.0737 | 6200 | 0.3431 | 1990752 |
| 0.3331 | 5.2374 | 6400 | 0.3439 | 2055200 |
| 0.3304 | 5.4011 | 6600 | 0.3433 | 2119232 |
| 0.3353 | 5.5649 | 6800 | 0.3783 | 2183440 |
| 0.3384 | 5.7286 | 7000 | 0.3570 | 2247920 |
| 0.325 | 5.8923 | 7200 | 0.3433 | 2312032 |
| 0.3435 | 6.0557 | 7400 | 0.3525 | 2376200 |
| 0.3338 | 6.2194 | 7600 | 0.3420 | 2440472 |
| 0.3565 | 6.3831 | 7800 | 0.3429 | 2504760 |
| 0.3573 | 6.5469 | 8000 | 0.3420 | 2568840 |
| 0.3532 | 6.7106 | 8200 | 0.3485 | 2632776 |
| 0.3175 | 6.8743 | 8400 | 0.3471 | 2697176 |
| 0.3888 | 7.0377 | 8600 | 0.3505 | 2761240 |
| 0.3738 | 7.2014 | 8800 | 0.3464 | 2825240 |
| 0.3309 | 7.3651 | 9000 | 0.3441 | 2889368 |
| 0.3753 | 7.5289 | 9200 | 0.3429 | 2953752 |
| 0.3474 | 7.6926 | 9400 | 0.3440 | 3018440 |
| 0.3587 | 7.8563 | 9600 | 0.3438 | 3082552 |
| 0.3241 | 8.0196 | 9800 | 0.3496 | 3146472 |
| 0.3416 | 8.1834 | 10000 | 0.3466 | 3211320 |
| 0.3205 | 8.3471 | 10200 | 0.3460 | 3275192 |
| 0.3409 | 8.5108 | 10400 | 0.3434 | 3339400 |
| 0.3533 | 8.6746 | 10600 | 0.3517 | 3403656 |
| 0.329 | 8.8383 | 10800 | 0.3490 | 3467848 |
| 0.3572 | 9.0016 | 11000 | 0.3443 | 3531952 |
| 0.3697 | 9.1654 | 11200 | 0.3481 | 3596368 |
| 0.3694 | 9.3291 | 11400 | 0.3481 | 3660496 |
| 0.3392 | 9.4928 | 11600 | 0.3444 | 3724480 |
| 0.3417 | 9.6566 | 11800 | 0.3486 | 3788928 |
| 0.3492 | 9.8203 | 12000 | 0.3457 | 3853296 |
| 0.3626 | 9.9840 | 12200 | 0.3455 | 3917232 |
| 0.3438 | 10.1474 | 12400 | 0.3516 | 3981568 |
| 0.354 | 10.3111 | 12600 | 0.3442 | 4045600 |
| 0.3716 | 10.4748 | 12800 | 0.3449 | 4110048 |
| 0.3411 | 10.6386 | 13000 | 0.3434 | 4174432 |
| 0.3487 | 10.8023 | 13200 | 0.3455 | 4238512 |
| 0.3629 | 10.9660 | 13400 | 0.3430 | 4302800 |
| 0.3548 | 11.1293 | 13600 | 0.3440 | 4366728 |
| 0.3451 | 11.2931 | 13800 | 0.3429 | 4431112 |
| 0.3442 | 11.4568 | 14000 | 0.3655 | 4495320 |
| 0.3531 | 11.6205 | 14200 | 0.3439 | 4559336 |
| 0.3375 | 11.7843 | 14400 | 0.3418 | 4623464 |
| 0.3508 | 11.9480 | 14600 | 0.3520 | 4687880 |
| 0.3395 | 12.1113 | 14800 | 0.3486 | 4752088 |
| 0.3679 | 12.2751 | 15000 | 0.3448 | 4816376 |
| 0.3634 | 12.4388 | 15200 | 0.3543 | 4881000 |
| 0.3731 | 12.6025 | 15400 | 0.3420 | 4944776 |
| 0.2947 | 12.7663 | 15600 | 0.3900 | 5009528 |
| 0.3405 | 12.9300 | 15800 | 0.3421 | 5073448 |
| 0.3273 | 13.0933 | 16000 | 0.3441 | 5137696 |
| 0.3395 | 13.2571 | 16200 | 0.3445 | 5202256 |
| 0.3425 | 13.4208 | 16400 | 0.3475 | 5266128 |
| 0.3368 | 13.5845 | 16600 | 0.3433 | 5330256 |
| 0.3147 | 13.7483 | 16800 | 0.3483 | 5395072 |
| 0.3465 | 13.9120 | 17000 | 0.3461 | 5458672 |
| 0.3411 | 14.0753 | 17200 | 0.3431 | 5522480 |
| 0.3708 | 14.2391 | 17400 | 0.3454 | 5586480 |
| 0.3329 | 14.4028 | 17600 | 0.3431 | 5650208 |
| 0.3504 | 14.5665 | 17800 | 0.3426 | 5714704 |
| 0.3506 | 14.7302 | 18000 | 0.3453 | 5779488 |
| 0.3732 | 14.8940 | 18200 | 0.3435 | 5843728 |
| 0.3452 | 15.0573 | 18400 | 0.3432 | 5908152 |
| 0.3796 | 15.2210 | 18600 | 0.3527 | 5972168 |
| 0.3255 | 15.3848 | 18800 | 0.3425 | 6037144 |
| 0.3537 | 15.5485 | 19000 | 0.3426 | 6101800 |
| 0.3704 | 15.7122 | 19200 | 0.3440 | 6165416 |
| 0.3256 | 15.8760 | 19400 | 0.3459 | 6229672 |
| 0.3405 | 16.0393 | 19600 | 0.3466 | 6293504 |
| 0.3401 | 16.2030 | 19800 | 0.3487 | 6357840 |
| 0.3278 | 16.3668 | 20000 | 0.3451 | 6422352 |
| 0.3515 | 16.5305 | 20200 | 0.3454 | 6486352 |
| 0.3805 | 16.6942 | 20400 | 0.3426 | 6550928 |
| 0.3247 | 16.8580 | 20600 | 0.3425 | 6615008 |
| 0.3379 | 17.0213 | 20800 | 0.3417 | 6678864 |
| 0.3847 | 17.1850 | 21000 | 0.3546 | 6743040 |
| 0.3499 | 17.3488 | 21200 | 0.3432 | 6807664 |
| 0.3582 | 17.5125 | 21400 | 0.3432 | 6871648 |
| 0.3596 | 17.6762 | 21600 | 0.3431 | 6936048 |
| 0.3505 | 17.8400 | 21800 | 0.3419 | 7000448 |
| 0.3393 | 18.0033 | 22000 | 0.3434 | 7064224 |
| 0.3196 | 18.1670 | 22200 | 0.3443 | 7128848 |
| 0.3472 | 18.3307 | 22400 | 0.3467 | 7192992 |
| 0.3448 | 18.4945 | 22600 | 0.3438 | 7256624 |
| 0.357 | 18.6582 | 22800 | 0.3473 | 7321520 |
| 0.3523 | 18.8219 | 23000 | 0.3437 | 7385552 |
| 0.3681 | 18.9857 | 23200 | 0.3453 | 7449600 |
| 0.3386 | 19.1490 | 23400 | 0.3422 | 7513504 |
| 0.3369 | 19.3127 | 23600 | 0.3429 | 7577776 |
| 0.3107 | 19.4765 | 23800 | 0.3459 | 7642048 |
| 0.3314 | 19.6402 | 24000 | 0.3469 | 7706720 |
| 0.3386 | 19.8039 | 24200 | 0.3420 | 7770896 |
| 0.3319 | 19.9677 | 24400 | 0.3436 | 7835136 |
| 0.3448 | 20.1310 | 24600 | 0.3467 | 7899176 |
| 0.3233 | 20.2947 | 24800 | 0.3457 | 7963800 |
| 0.3505 | 20.4585 | 25000 | 0.3428 | 8028584 |
| 0.3438 | 20.6222 | 25200 | 0.3431 | 8092616 |
| 0.3706 | 20.7859 | 25400 | 0.3501 | 8157000 |
| 0.3474 | 20.9497 | 25600 | 0.3464 | 8220920 |
| 0.3533 | 21.1130 | 25800 | 0.3508 | 8284832 |
| 0.3672 | 21.2767 | 26000 | 0.3442 | 8348832 |
| 0.3316 | 21.4404 | 26200 | 0.3440 | 8412992 |
| 0.351 | 21.6042 | 26400 | 0.3427 | 8476944 |
| 0.3447 | 21.7679 | 26600 | 0.3619 | 8541536 |
| 0.3617 | 21.9316 | 26800 | 0.3439 | 8606128 |
| 0.3299 | 22.0950 | 27000 | 0.3428 | 8670264 |
| 0.3481 | 22.2587 | 27200 | 0.3424 | 8734456 |
| 0.3773 | 22.4224 | 27400 | 0.3426 | 8798776 |
| 0.3257 | 22.5862 | 27600 | 0.3454 | 8862888 |
| 0.3275 | 22.7499 | 27800 | 0.3439 | 8927464 |
| 0.3362 | 22.9136 | 28000 | 0.3435 | 8991912 |
| 0.3362 | 23.0770 | 28200 | 0.3440 | 9055920 |
| 0.3551 | 23.2407 | 28400 | 0.3440 | 9120064 |
| 0.3518 | 23.4044 | 28600 | 0.3442 | 9184496 |
| 0.3148 | 23.5682 | 28800 | 0.3449 | 9248672 |
| 0.3106 | 23.7319 | 29000 | 0.3453 | 9312880 |
| 0.3555 | 23.8956 | 29200 | 0.3436 | 9377264 |
| 0.3472 | 24.0589 | 29400 | 0.3430 | 9441584 |
| 0.3418 | 24.2227 | 29600 | 0.3453 | 9505936 |
| 0.3273 | 24.3864 | 29800 | 0.3447 | 9570272 |
| 0.3479 | 24.5501 | 30000 | 0.3440 | 9634480 |
| 0.3238 | 24.7139 | 30200 | 0.3462 | 9698784 |
| 0.3304 | 24.8776 | 30400 | 0.3445 | 9762800 |
| 0.3261 | 25.0409 | 30600 | 0.3445 | 9826744 |
| 0.3406 | 25.2047 | 30800 | 0.3440 | 9890760 |
| 0.3602 | 25.3684 | 31000 | 0.3442 | 9955112 |
| 0.3563 | 25.5321 | 31200 | 0.3441 | 10019448 |
| 0.3357 | 25.6959 | 31400 | 0.3440 | 10083848 |
| 0.3398 | 25.8596 | 31600 | 0.3453 | 10147752 |
| 0.303 | 26.0229 | 31800 | 0.3437 | 10211912 |
| 0.3469 | 26.1867 | 32000 | 0.3426 | 10275928 |
| 0.3584 | 26.3504 | 32200 | 0.3436 | 10340168 |
| 0.3326 | 26.5141 | 32400 | 0.3441 | 10404376 |
| 0.3313 | 26.6779 | 32600 | 0.3437 | 10469048 |
| 0.3738 | 26.8416 | 32800 | 0.3419 | 10533640 |
| 0.341 | 27.0049 | 33000 | 0.3444 | 10597888 |
| 0.3373 | 27.1686 | 33200 | 0.3431 | 10662240 |
| 0.3651 | 27.3324 | 33400 | 0.3462 | 10726640 |
| 0.3839 | 27.4961 | 33600 | 0.3430 | 10790608 |
| 0.312 | 27.6598 | 33800 | 0.3445 | 10854688 |
| 0.3114 | 27.8236 | 34000 | 0.3438 | 10919360 |
| 0.3776 | 27.9873 | 34200 | 0.3446 | 10983664 |
| 0.345 | 28.1506 | 34400 | 0.3449 | 11047464 |
| 0.3414 | 28.3144 | 34600 | 0.3435 | 11111848 |
| 0.3301 | 28.4781 | 34800 | 0.3438 | 11176376 |
| 0.3339 | 28.6418 | 35000 | 0.3433 | 11241256 |
| 0.337 | 28.8056 | 35200 | 0.3456 | 11305112 |
| 0.3345 | 28.9693 | 35400 | 0.3449 | 11369464 |
| 0.3182 | 29.1326 | 35600 | 0.3431 | 11433608 |
| 0.3427 | 29.2964 | 35800 | 0.3426 | 11497944 |
| 0.3489 | 29.4601 | 36000 | 0.3442 | 11562200 |
| 0.3083 | 29.6238 | 36200 | 0.3434 | 11626152 |
| 0.3411 | 29.7876 | 36400 | 0.3436 | 11690824 |
| 0.3475 | 29.9513 | 36600 | 0.3439 | 11755016 |
| 0.3509 | 30.1146 | 36800 | 0.3440 | 11818880 |
| 0.3278 | 30.2783 | 37000 | 0.3445 | 11882768 |
| 0.3491 | 30.4421 | 37200 | 0.3444 | 11946912 |
| 0.3438 | 30.6058 | 37400 | 0.3438 | 12011696 |
| 0.3426 | 30.7695 | 37600 | 0.3436 | 12075664 |
| 0.3705 | 30.9333 | 37800 | 0.3436 | 12139680 |
| 0.3554 | 31.0966 | 38000 | 0.3443 | 12204000 |
| 0.3341 | 31.2603 | 38200 | 0.3456 | 12268800 |
| 0.3694 | 31.4241 | 38400 | 0.3447 | 12333024 |
| 0.3434 | 31.5878 | 38600 | 0.3445 | 12396976 |
| 0.3462 | 31.7515 | 38800 | 0.3440 | 12461104 |
| 0.3426 | 31.9153 | 39000 | 0.3444 | 12524768 |
| 0.3444 | 32.0786 | 39200 | 0.3436 | 12588496 |
| 0.3395 | 32.2423 | 39400 | 0.3451 | 12653136 |
| 0.3414 | 32.4061 | 39600 | 0.3437 | 12717328 |
| 0.3265 | 32.5698 | 39800 | 0.3433 | 12781536 |
| 0.3342 | 32.7335 | 40000 | 0.3445 | 12845616 |
Framework versions
- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_wic_1745950293
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3