train_wic_1745950293

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3417
  • Num Input Tokens Seen: 12845616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3857 0.1637 200 0.4081 64080
0.3452 0.3275 400 0.3471 128048
0.3496 0.4912 600 0.3524 192224
0.3446 0.6549 800 0.3534 256832
0.4153 0.8187 1000 0.3626 321264
0.3167 0.9824 1200 0.3501 385728
0.3378 1.1457 1400 0.3641 449768
0.3471 1.3095 1600 0.3551 514072
0.3574 1.4732 1800 0.3458 578408
0.3519 1.6369 2000 0.3447 642248
0.3772 1.8007 2200 0.3861 706488
0.3803 1.9644 2400 0.3458 770888
0.4218 2.1277 2600 0.3629 835216
0.3449 2.2914 2800 0.3458 899312
0.5685 2.4552 3000 0.3504 963696
0.3183 2.6189 3200 0.3582 1027904
0.3696 2.7826 3400 0.3432 1092016
0.3519 2.9464 3600 0.3418 1156240
0.3502 3.1097 3800 0.3447 1220568
0.3436 3.2734 4000 0.3677 1285128
0.532 3.4372 4200 0.3532 1349032
0.3328 3.6009 4400 0.3417 1413096
0.3526 3.7646 4600 0.3477 1477816
0.3341 3.9284 4800 0.3422 1541800
0.3339 4.0917 5000 0.3496 1605480
0.5998 4.2554 5200 0.3777 1669464
0.4398 4.4192 5400 0.3473 1733528
0.3754 4.5829 5600 0.3421 1797608
0.3413 4.7466 5800 0.3469 1862328
0.3449 4.9104 6000 0.3605 1926824
0.3519 5.0737 6200 0.3431 1990752
0.3331 5.2374 6400 0.3439 2055200
0.3304 5.4011 6600 0.3433 2119232
0.3353 5.5649 6800 0.3783 2183440
0.3384 5.7286 7000 0.3570 2247920
0.325 5.8923 7200 0.3433 2312032
0.3435 6.0557 7400 0.3525 2376200
0.3338 6.2194 7600 0.3420 2440472
0.3565 6.3831 7800 0.3429 2504760
0.3573 6.5469 8000 0.3420 2568840
0.3532 6.7106 8200 0.3485 2632776
0.3175 6.8743 8400 0.3471 2697176
0.3888 7.0377 8600 0.3505 2761240
0.3738 7.2014 8800 0.3464 2825240
0.3309 7.3651 9000 0.3441 2889368
0.3753 7.5289 9200 0.3429 2953752
0.3474 7.6926 9400 0.3440 3018440
0.3587 7.8563 9600 0.3438 3082552
0.3241 8.0196 9800 0.3496 3146472
0.3416 8.1834 10000 0.3466 3211320
0.3205 8.3471 10200 0.3460 3275192
0.3409 8.5108 10400 0.3434 3339400
0.3533 8.6746 10600 0.3517 3403656
0.329 8.8383 10800 0.3490 3467848
0.3572 9.0016 11000 0.3443 3531952
0.3697 9.1654 11200 0.3481 3596368
0.3694 9.3291 11400 0.3481 3660496
0.3392 9.4928 11600 0.3444 3724480
0.3417 9.6566 11800 0.3486 3788928
0.3492 9.8203 12000 0.3457 3853296
0.3626 9.9840 12200 0.3455 3917232
0.3438 10.1474 12400 0.3516 3981568
0.354 10.3111 12600 0.3442 4045600
0.3716 10.4748 12800 0.3449 4110048
0.3411 10.6386 13000 0.3434 4174432
0.3487 10.8023 13200 0.3455 4238512
0.3629 10.9660 13400 0.3430 4302800
0.3548 11.1293 13600 0.3440 4366728
0.3451 11.2931 13800 0.3429 4431112
0.3442 11.4568 14000 0.3655 4495320
0.3531 11.6205 14200 0.3439 4559336
0.3375 11.7843 14400 0.3418 4623464
0.3508 11.9480 14600 0.3520 4687880
0.3395 12.1113 14800 0.3486 4752088
0.3679 12.2751 15000 0.3448 4816376
0.3634 12.4388 15200 0.3543 4881000
0.3731 12.6025 15400 0.3420 4944776
0.2947 12.7663 15600 0.3900 5009528
0.3405 12.9300 15800 0.3421 5073448
0.3273 13.0933 16000 0.3441 5137696
0.3395 13.2571 16200 0.3445 5202256
0.3425 13.4208 16400 0.3475 5266128
0.3368 13.5845 16600 0.3433 5330256
0.3147 13.7483 16800 0.3483 5395072
0.3465 13.9120 17000 0.3461 5458672
0.3411 14.0753 17200 0.3431 5522480
0.3708 14.2391 17400 0.3454 5586480
0.3329 14.4028 17600 0.3431 5650208
0.3504 14.5665 17800 0.3426 5714704
0.3506 14.7302 18000 0.3453 5779488
0.3732 14.8940 18200 0.3435 5843728
0.3452 15.0573 18400 0.3432 5908152
0.3796 15.2210 18600 0.3527 5972168
0.3255 15.3848 18800 0.3425 6037144
0.3537 15.5485 19000 0.3426 6101800
0.3704 15.7122 19200 0.3440 6165416
0.3256 15.8760 19400 0.3459 6229672
0.3405 16.0393 19600 0.3466 6293504
0.3401 16.2030 19800 0.3487 6357840
0.3278 16.3668 20000 0.3451 6422352
0.3515 16.5305 20200 0.3454 6486352
0.3805 16.6942 20400 0.3426 6550928
0.3247 16.8580 20600 0.3425 6615008
0.3379 17.0213 20800 0.3417 6678864
0.3847 17.1850 21000 0.3546 6743040
0.3499 17.3488 21200 0.3432 6807664
0.3582 17.5125 21400 0.3432 6871648
0.3596 17.6762 21600 0.3431 6936048
0.3505 17.8400 21800 0.3419 7000448
0.3393 18.0033 22000 0.3434 7064224
0.3196 18.1670 22200 0.3443 7128848
0.3472 18.3307 22400 0.3467 7192992
0.3448 18.4945 22600 0.3438 7256624
0.357 18.6582 22800 0.3473 7321520
0.3523 18.8219 23000 0.3437 7385552
0.3681 18.9857 23200 0.3453 7449600
0.3386 19.1490 23400 0.3422 7513504
0.3369 19.3127 23600 0.3429 7577776
0.3107 19.4765 23800 0.3459 7642048
0.3314 19.6402 24000 0.3469 7706720
0.3386 19.8039 24200 0.3420 7770896
0.3319 19.9677 24400 0.3436 7835136
0.3448 20.1310 24600 0.3467 7899176
0.3233 20.2947 24800 0.3457 7963800
0.3505 20.4585 25000 0.3428 8028584
0.3438 20.6222 25200 0.3431 8092616
0.3706 20.7859 25400 0.3501 8157000
0.3474 20.9497 25600 0.3464 8220920
0.3533 21.1130 25800 0.3508 8284832
0.3672 21.2767 26000 0.3442 8348832
0.3316 21.4404 26200 0.3440 8412992
0.351 21.6042 26400 0.3427 8476944
0.3447 21.7679 26600 0.3619 8541536
0.3617 21.9316 26800 0.3439 8606128
0.3299 22.0950 27000 0.3428 8670264
0.3481 22.2587 27200 0.3424 8734456
0.3773 22.4224 27400 0.3426 8798776
0.3257 22.5862 27600 0.3454 8862888
0.3275 22.7499 27800 0.3439 8927464
0.3362 22.9136 28000 0.3435 8991912
0.3362 23.0770 28200 0.3440 9055920
0.3551 23.2407 28400 0.3440 9120064
0.3518 23.4044 28600 0.3442 9184496
0.3148 23.5682 28800 0.3449 9248672
0.3106 23.7319 29000 0.3453 9312880
0.3555 23.8956 29200 0.3436 9377264
0.3472 24.0589 29400 0.3430 9441584
0.3418 24.2227 29600 0.3453 9505936
0.3273 24.3864 29800 0.3447 9570272
0.3479 24.5501 30000 0.3440 9634480
0.3238 24.7139 30200 0.3462 9698784
0.3304 24.8776 30400 0.3445 9762800
0.3261 25.0409 30600 0.3445 9826744
0.3406 25.2047 30800 0.3440 9890760
0.3602 25.3684 31000 0.3442 9955112
0.3563 25.5321 31200 0.3441 10019448
0.3357 25.6959 31400 0.3440 10083848
0.3398 25.8596 31600 0.3453 10147752
0.303 26.0229 31800 0.3437 10211912
0.3469 26.1867 32000 0.3426 10275928
0.3584 26.3504 32200 0.3436 10340168
0.3326 26.5141 32400 0.3441 10404376
0.3313 26.6779 32600 0.3437 10469048
0.3738 26.8416 32800 0.3419 10533640
0.341 27.0049 33000 0.3444 10597888
0.3373 27.1686 33200 0.3431 10662240
0.3651 27.3324 33400 0.3462 10726640
0.3839 27.4961 33600 0.3430 10790608
0.312 27.6598 33800 0.3445 10854688
0.3114 27.8236 34000 0.3438 10919360
0.3776 27.9873 34200 0.3446 10983664
0.345 28.1506 34400 0.3449 11047464
0.3414 28.3144 34600 0.3435 11111848
0.3301 28.4781 34800 0.3438 11176376
0.3339 28.6418 35000 0.3433 11241256
0.337 28.8056 35200 0.3456 11305112
0.3345 28.9693 35400 0.3449 11369464
0.3182 29.1326 35600 0.3431 11433608
0.3427 29.2964 35800 0.3426 11497944
0.3489 29.4601 36000 0.3442 11562200
0.3083 29.6238 36200 0.3434 11626152
0.3411 29.7876 36400 0.3436 11690824
0.3475 29.9513 36600 0.3439 11755016
0.3509 30.1146 36800 0.3440 11818880
0.3278 30.2783 37000 0.3445 11882768
0.3491 30.4421 37200 0.3444 11946912
0.3438 30.6058 37400 0.3438 12011696
0.3426 30.7695 37600 0.3436 12075664
0.3705 30.9333 37800 0.3436 12139680
0.3554 31.0966 38000 0.3443 12204000
0.3341 31.2603 38200 0.3456 12268800
0.3694 31.4241 38400 0.3447 12333024
0.3434 31.5878 38600 0.3445 12396976
0.3462 31.7515 38800 0.3440 12461104
0.3426 31.9153 39000 0.3444 12524768
0.3444 32.0786 39200 0.3436 12588496
0.3395 32.2423 39400 0.3451 12653136
0.3414 32.4061 39600 0.3437 12717328
0.3265 32.5698 39800 0.3433 12781536
0.3342 32.7335 40000 0.3445 12845616

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950293

Adapter
(540)
this model

Evaluation results