train_wic_1745950289

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3401
  • Num Input Tokens Seen: 12716696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5065 0.1637 200 0.5396 63344
0.3591 0.3275 400 0.3541 126720
0.4417 0.4912 600 0.3968 190304
0.4891 0.6549 800 0.3590 254384
0.3967 0.8187 1000 0.3818 318128
0.3858 0.9824 1200 0.3527 381920
0.3513 1.1457 1400 0.3502 445096
0.3405 1.3095 1600 0.3826 508744
0.4021 1.4732 1800 0.3483 572408
0.3557 1.6369 2000 0.3458 635736
0.3648 1.8007 2200 0.3572 699464
0.3087 1.9644 2400 0.4780 763192
0.4053 2.1277 2600 0.3547 826784
0.4281 2.2914 2800 0.3490 890336
0.3645 2.4552 3000 0.3593 953840
0.3349 2.6189 3200 0.3629 1017600
0.3706 2.7826 3400 0.3511 1081104
0.3528 2.9464 3600 0.3451 1144576
0.3656 3.1097 3800 0.3496 1208440
0.3473 3.2734 4000 0.3893 1272216
0.3305 3.4372 4200 0.3602 1335496
0.3573 3.6009 4400 0.3460 1398984
0.3896 3.7646 4600 0.3575 1462856
0.3397 3.9284 4800 0.3458 1526280
0.3514 4.0917 5000 0.3485 1589584
0.6668 4.2554 5200 0.3508 1653024
0.3849 4.4192 5400 0.3482 1716432
0.379 4.5829 5600 0.3448 1779984
0.3405 4.7466 5800 0.3458 1843936
0.4002 4.9104 6000 0.3867 1907808
0.3535 5.0737 6200 0.3517 1971048
0.3731 5.2374 6400 0.3444 2034808
0.3293 5.4011 6600 0.3439 2098088
0.3836 5.5649 6800 0.4214 2161640
0.3358 5.7286 7000 0.3921 2225432
0.3696 5.8923 7200 0.3488 2289032
0.3513 6.0557 7400 0.3530 2352656
0.3305 6.2194 7600 0.3605 2416160
0.3563 6.3831 7800 0.3427 2479728
0.3611 6.5469 8000 0.3434 2543168
0.347 6.7106 8200 0.3525 2606560
0.3083 6.8743 8400 0.3547 2670208
0.3976 7.0377 8600 0.3833 2733584
0.3761 7.2014 8800 0.3490 2797008
0.3151 7.3651 9000 0.3430 2860576
0.365 7.5289 9200 0.3438 2924256
0.3556 7.6926 9400 0.3516 2988272
0.3605 7.8563 9600 0.3564 3051776
0.3351 8.0196 9800 0.3440 3114992
0.3529 8.1834 10000 0.3442 3179200
0.3084 8.3471 10200 0.3620 3242496
0.3466 8.5108 10400 0.3426 3306112
0.3848 8.6746 10600 0.3642 3369760
0.3336 8.8383 10800 0.3417 3433360
0.3275 9.0016 11000 0.3656 3496680
0.3595 9.1654 11200 0.3539 3560648
0.481 9.3291 11400 0.3790 3624200
0.358 9.4928 11600 0.3583 3687560
0.3582 9.6566 11800 0.3685 3751288
0.3476 9.8203 12000 0.3542 3814952
0.3758 9.9840 12200 0.3419 3878120
0.3407 10.1474 12400 0.3421 3941616
0.359 10.3111 12600 0.3778 4005216
0.4143 10.4748 12800 0.3517 4068912
0.3404 10.6386 13000 0.3437 4132608
0.3326 10.8023 13200 0.3473 4196096
0.3752 10.9660 13400 0.3415 4259680
0.3604 11.1293 13600 0.3417 4323128
0.3652 11.2931 13800 0.3412 4386856
0.3631 11.4568 14000 0.4083 4450296
0.3529 11.6205 14200 0.3433 4513544
0.3592 11.7843 14400 0.3439 4576984
0.3624 11.9480 14600 0.3481 4640904
0.3325 12.1113 14800 0.3525 4704360
0.3417 12.2751 15000 0.3641 4768152
0.3616 12.4388 15200 0.3509 4832152
0.3618 12.6025 15400 0.3435 4895192
0.2959 12.7663 15600 0.3713 4959112
0.3387 12.9300 15800 0.3452 5022408
0.3556 13.0933 16000 0.3429 5086016
0.3536 13.2571 16200 0.3471 5149920
0.3314 13.4208 16400 0.3433 5213296
0.3272 13.5845 16600 0.3430 5276672
0.3096 13.7483 16800 0.3461 5340624
0.3368 13.9120 17000 0.3429 5403792
0.3331 14.0753 17200 0.3419 5466936
0.3603 14.2391 17400 0.3429 5530392
0.343 14.4028 17600 0.3444 5593576
0.3551 14.5665 17800 0.3428 5657288
0.3524 14.7302 18000 0.3417 5721496
0.3649 14.8940 18200 0.3420 5785096
0.3429 15.0573 18400 0.3449 5848736
0.3931 15.2210 18600 0.3472 5912176
0.3289 15.3848 18800 0.3452 5976400
0.3598 15.5485 19000 0.3416 6040272
0.3597 15.7122 19200 0.3496 6103424
0.3246 15.8760 19400 0.3464 6166912
0.3315 16.0393 19600 0.3467 6230320
0.3437 16.2030 19800 0.3515 6294224
0.3234 16.3668 20000 0.3443 6357984
0.3441 16.5305 20200 0.3408 6421344
0.3771 16.6942 20400 0.3424 6485152
0.3228 16.8580 20600 0.3413 6548768
0.3452 17.0213 20800 0.3402 6611792
0.3946 17.1850 21000 0.3696 6675216
0.3497 17.3488 21200 0.3429 6739088
0.3684 17.5125 21400 0.3428 6802352
0.3571 17.6762 21600 0.3407 6866160
0.3559 17.8400 21800 0.3422 6929936
0.3334 18.0033 22000 0.3469 6993168
0.326 18.1670 22200 0.3428 7057008
0.3536 18.3307 22400 0.3474 7120624
0.3444 18.4945 22600 0.3433 7183872
0.3523 18.6582 22800 0.3550 7247952
0.3489 18.8219 23000 0.3424 7311488
0.3721 18.9857 23200 0.3442 7374848
0.3305 19.1490 23400 0.3444 7438160
0.3571 19.3127 23600 0.3422 7501872
0.3298 19.4765 23800 0.3449 7565520
0.3438 19.6402 24000 0.3472 7629488
0.3458 19.8039 24200 0.3406 7692992
0.3318 19.9677 24400 0.3416 7756512
0.3622 20.1310 24600 0.3504 7819816
0.3295 20.2947 24800 0.3480 7883800
0.3473 20.4585 25000 0.3407 7947944
0.3418 20.6222 25200 0.3414 8011336
0.3751 20.7859 25400 0.3460 8075000
0.3266 20.9497 25600 0.3427 8138568
0.3622 21.1130 25800 0.3528 8201872
0.3774 21.2767 26000 0.3425 8265168
0.3339 21.4404 26200 0.3426 8328704
0.3408 21.6042 26400 0.3419 8392144
0.3361 21.7679 26600 0.3685 8456096
0.3613 21.9316 26800 0.3409 8519872
0.3437 22.0950 27000 0.3427 8583464
0.343 22.2587 27200 0.3421 8646840
0.3847 22.4224 27400 0.3404 8710600
0.3366 22.5862 27600 0.3436 8774344
0.3391 22.7499 27800 0.3416 8838024
0.3389 22.9136 28000 0.3412 8901832
0.3344 23.0770 28200 0.3423 8965184
0.3528 23.2407 28400 0.3417 9028576
0.3488 23.4044 28600 0.3414 9092256
0.3186 23.5682 28800 0.3416 9155872
0.323 23.7319 29000 0.3437 9219312
0.3526 23.8956 29200 0.3435 9283264
0.3631 24.0589 29400 0.3422 9346992
0.341 24.2227 29600 0.3443 9410880
0.3369 24.3864 29800 0.3431 9474704
0.3443 24.5501 30000 0.3413 9538160
0.3313 24.7139 30200 0.3428 9601792
0.3288 24.8776 30400 0.3433 9664976
0.3273 25.0409 30600 0.3405 9728232
0.3402 25.2047 30800 0.3426 9791848
0.3501 25.3684 31000 0.3421 9855400
0.3665 25.5321 31200 0.3435 9918984
0.3395 25.6959 31400 0.3409 9982872
0.3486 25.8596 31600 0.3427 10046056
0.3176 26.0229 31800 0.3437 10109568
0.3398 26.1867 32000 0.3404 10173072
0.3515 26.3504 32200 0.3432 10236512
0.3292 26.5141 32400 0.3431 10299920
0.3336 26.6779 32600 0.3428 10363808
0.3551 26.8416 32800 0.3417 10427744
0.3327 27.0049 33000 0.3425 10491384
0.347 27.1686 33200 0.3419 10555192
0.3613 27.3324 33400 0.3444 10619080
0.3946 27.4961 33600 0.3408 10682424
0.325 27.6598 33800 0.3415 10746024
0.3064 27.8236 34000 0.3413 10809736
0.3768 27.9873 34200 0.3420 10873448
0.3476 28.1506 34400 0.3434 10936704
0.3491 28.3144 34600 0.3401 11000112
0.3311 28.4781 34800 0.3417 11063936
0.3356 28.6418 35000 0.3414 11128160
0.3316 28.8056 35200 0.3424 11191600
0.3294 28.9693 35400 0.3426 11255184
0.3253 29.1326 35600 0.3421 11318640
0.3424 29.2964 35800 0.3420 11382352
0.3419 29.4601 36000 0.3410 11446048
0.3129 29.6238 36200 0.3411 11509328
0.3309 29.7876 36400 0.3408 11573312
0.3477 29.9513 36600 0.3426 11636752
0.3555 30.1146 36800 0.3434 11700056
0.3449 30.2783 37000 0.3430 11763352
0.3533 30.4421 37200 0.3415 11826952
0.3442 30.6058 37400 0.3421 11890888
0.3441 30.7695 37600 0.3419 11954296
0.3564 30.9333 37800 0.3424 12017784
0.3582 31.0966 38000 0.3422 12081304
0.3418 31.2603 38200 0.3430 12145240
0.3733 31.4241 38400 0.3442 12208888
0.342 31.5878 38600 0.3433 12272344
0.3461 31.7515 38800 0.3431 12335960
0.3463 31.9153 39000 0.3428 12399064
0.3469 32.0786 39200 0.3425 12462200
0.3511 32.2423 39400 0.3425 12526024
0.3319 32.4061 39600 0.3424 12589496
0.3255 32.5698 39800 0.3426 12653080
0.3419 32.7335 40000 0.3423 12716696

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950289

Adapter
(2105)
this model

Evaluation results