train_wic_1745950291

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4934
  • Num Input Tokens Seen: 12716696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.6257 0.1637 200 0.5225 63344
0.4145 0.3275 400 0.5219 126720
0.6067 0.4912 600 0.5143 190304
0.5509 0.6549 800 0.5104 254384
0.4087 0.8187 1000 0.5072 318128
0.5062 0.9824 1200 0.5076 381920
0.4057 1.1457 1400 0.5047 445096
0.6519 1.3095 1600 0.5058 508744
0.4278 1.4732 1800 0.5040 572408
0.516 1.6369 2000 0.5023 635736
0.4402 1.8007 2200 0.4989 699464
0.3649 1.9644 2400 0.5017 763192
0.722 2.1277 2600 0.5006 826784
0.6072 2.2914 2800 0.5021 890336
0.4783 2.4552 3000 0.4986 953840
0.3192 2.6189 3200 0.4998 1017600
0.6125 2.7826 3400 0.4971 1081104
0.3693 2.9464 3600 0.5000 1144576
0.5569 3.1097 3800 0.5020 1208440
0.6581 3.2734 4000 0.4973 1272216
0.3633 3.4372 4200 0.4999 1335496
0.5302 3.6009 4400 0.5050 1398984
0.3837 3.7646 4600 0.4959 1462856
0.5727 3.9284 4800 0.4986 1526280
0.417 4.0917 5000 0.4981 1589584
0.381 4.2554 5200 0.4988 1653024
0.3998 4.4192 5400 0.4994 1716432
0.3977 4.5829 5600 0.5029 1779984
0.364 4.7466 5800 0.5024 1843936
0.6055 4.9104 6000 0.5001 1907808
0.4597 5.0737 6200 0.5003 1971048
0.4152 5.2374 6400 0.5005 2034808
0.4998 5.4011 6600 0.5010 2098088
0.5148 5.5649 6800 0.5005 2161640
0.4574 5.7286 7000 0.4973 2225432
0.884 5.8923 7200 0.4995 2289032
0.5194 6.0557 7400 0.4955 2352656
0.6431 6.2194 7600 0.4975 2416160
0.3991 6.3831 7800 0.4986 2479728
0.532 6.5469 8000 0.4968 2543168
0.4574 6.7106 8200 0.4997 2606560
0.4313 6.8743 8400 0.4990 2670208
0.5079 7.0377 8600 0.4967 2733584
0.4926 7.2014 8800 0.4963 2797008
0.6941 7.3651 9000 0.5011 2860576
0.4878 7.5289 9200 0.4988 2924256
0.4491 7.6926 9400 0.4975 2988272
0.5816 7.8563 9600 0.4988 3051776
0.3643 8.0196 9800 0.4955 3114992
0.5292 8.1834 10000 0.4965 3179200
0.3784 8.3471 10200 0.4981 3242496
0.5082 8.5108 10400 0.4971 3306112
0.5478 8.6746 10600 0.4993 3369760
0.6724 8.8383 10800 0.4998 3433360
0.5947 9.0016 11000 0.4980 3496680
0.5989 9.1654 11200 0.5002 3560648
0.5554 9.3291 11400 0.4983 3624200
0.3369 9.4928 11600 0.5003 3687560
0.5688 9.6566 11800 0.5014 3751288
0.4692 9.8203 12000 0.4971 3814952
0.6744 9.9840 12200 0.5008 3878120
0.4068 10.1474 12400 0.4992 3941616
0.4359 10.3111 12600 0.4981 4005216
0.5724 10.4748 12800 0.4960 4068912
0.5359 10.6386 13000 0.4971 4132608
0.4707 10.8023 13200 0.4980 4196096
0.5272 10.9660 13400 0.4969 4259680
0.6006 11.1293 13600 0.4966 4323128
0.4663 11.2931 13800 0.4977 4386856
0.3614 11.4568 14000 0.4935 4450296
0.6643 11.6205 14200 0.4980 4513544
0.5071 11.7843 14400 0.5001 4576984
0.3758 11.9480 14600 0.4987 4640904
0.3884 12.1113 14800 0.4975 4704360
0.304 12.2751 15000 0.4966 4768152
0.4518 12.4388 15200 0.4974 4832152
0.3722 12.6025 15400 0.4999 4895192
0.3803 12.7663 15600 0.4989 4959112
0.4056 12.9300 15800 0.4952 5022408
0.7264 13.0933 16000 0.4986 5086016
0.6845 13.2571 16200 0.4999 5149920
0.3888 13.4208 16400 0.4991 5213296
0.6898 13.5845 16600 0.4985 5276672
0.4119 13.7483 16800 0.5017 5340624
0.4066 13.9120 17000 0.4966 5403792
0.6487 14.0753 17200 0.4955 5466936
0.6244 14.2391 17400 0.4985 5530392
0.6813 14.4028 17600 0.4988 5593576
0.55 14.5665 17800 0.4999 5657288
0.4325 14.7302 18000 0.4973 5721496
0.541 14.8940 18200 0.4976 5785096
0.6722 15.0573 18400 0.4993 5848736
0.5625 15.2210 18600 0.4954 5912176
0.4723 15.3848 18800 0.4965 5976400
0.31 15.5485 19000 0.4957 6040272
0.4716 15.7122 19200 0.4957 6103424
0.5429 15.8760 19400 0.4934 6166912
0.3732 16.0393 19600 0.4961 6230320
0.4673 16.2030 19800 0.4972 6294224
0.4359 16.3668 20000 0.4974 6357984
0.3628 16.5305 20200 0.5007 6421344
0.3717 16.6942 20400 0.4999 6485152
0.3153 16.8580 20600 0.4961 6548768
0.6308 17.0213 20800 0.4971 6611792
0.6157 17.1850 21000 0.4995 6675216
0.4635 17.3488 21200 0.4987 6739088
0.6582 17.5125 21400 0.4991 6802352
0.2988 17.6762 21600 0.4997 6866160
0.3709 17.8400 21800 0.5029 6929936
0.3607 18.0033 22000 0.4944 6993168
0.7202 18.1670 22200 0.5041 7057008
0.3716 18.3307 22400 0.5014 7120624
0.4817 18.4945 22600 0.4980 7183872
0.5667 18.6582 22800 0.4962 7247952
0.3868 18.8219 23000 0.4981 7311488
0.4314 18.9857 23200 0.4989 7374848
0.5291 19.1490 23400 0.4971 7438160
0.5263 19.3127 23600 0.4991 7501872
0.5666 19.4765 23800 0.4970 7565520
0.6424 19.6402 24000 0.4947 7629488
0.5894 19.8039 24200 0.4982 7692992
0.303 19.9677 24400 0.4980 7756512
0.5242 20.1310 24600 0.4970 7819816
0.331 20.2947 24800 0.4987 7883800
0.4012 20.4585 25000 0.4947 7947944
0.5083 20.6222 25200 0.4989 8011336
0.4885 20.7859 25400 0.4996 8075000
0.5333 20.9497 25600 0.4989 8138568
0.5209 21.1130 25800 0.5002 8201872
0.7051 21.2767 26000 0.4995 8265168
0.5638 21.4404 26200 0.5024 8328704
0.6135 21.6042 26400 0.4948 8392144
0.8321 21.7679 26600 0.4984 8456096
0.6106 21.9316 26800 0.5017 8519872
0.5066 22.0950 27000 0.5002 8583464
0.5766 22.2587 27200 0.4949 8646840
0.5146 22.4224 27400 0.4984 8710600
0.6664 22.5862 27600 0.4979 8774344
0.5827 22.7499 27800 0.4989 8838024
0.5015 22.9136 28000 0.4998 8901832
0.3741 23.0770 28200 0.4952 8965184
0.4112 23.2407 28400 0.4975 9028576
0.3413 23.4044 28600 0.5026 9092256
0.3816 23.5682 28800 0.4968 9155872
0.5038 23.7319 29000 0.4988 9219312
0.509 23.8956 29200 0.5012 9283264
0.4391 24.0589 29400 0.4994 9346992
0.3301 24.2227 29600 0.5016 9410880
0.6701 24.3864 29800 0.4956 9474704
0.3837 24.5501 30000 0.4996 9538160
0.6954 24.7139 30200 0.5018 9601792
0.6162 24.8776 30400 0.4981 9664976
0.5058 25.0409 30600 0.4952 9728232
0.6277 25.2047 30800 0.5002 9791848
0.3653 25.3684 31000 0.4973 9855400
0.4652 25.5321 31200 0.5014 9918984
0.2707 25.6959 31400 0.4962 9982872
0.5098 25.8596 31600 0.5003 10046056
0.4843 26.0229 31800 0.5000 10109568
0.5279 26.1867 32000 0.4986 10173072
0.4396 26.3504 32200 0.5003 10236512
0.7524 26.5141 32400 0.4994 10299920
0.5412 26.6779 32600 0.4996 10363808
0.6239 26.8416 32800 0.5021 10427744
0.4925 27.0049 33000 0.4980 10491384
0.4674 27.1686 33200 0.5011 10555192
0.4568 27.3324 33400 0.4977 10619080
0.4934 27.4961 33600 0.4955 10682424
0.8816 27.6598 33800 0.4993 10746024
0.3269 27.8236 34000 0.4972 10809736
0.4768 27.9873 34200 0.4941 10873448
0.6487 28.1506 34400 0.4946 10936704
0.5115 28.3144 34600 0.4938 11000112
0.5026 28.4781 34800 0.4966 11063936
0.4725 28.6418 35000 0.4996 11128160
0.3988 28.8056 35200 0.4996 11191600
0.7055 28.9693 35400 0.4961 11255184
0.2657 29.1326 35600 0.4985 11318640
0.3977 29.2964 35800 0.4985 11382352
0.5586 29.4601 36000 0.4985 11446048
0.4327 29.6238 36200 0.4985 11509328
0.3437 29.7876 36400 0.4985 11573312
0.5439 29.9513 36600 0.4985 11636752
0.5447 30.1146 36800 0.4985 11700056
0.4514 30.2783 37000 0.4985 11763352
0.7178 30.4421 37200 0.4985 11826952
0.7133 30.6058 37400 0.4985 11890888
0.5499 30.7695 37600 0.4985 11954296
0.8377 30.9333 37800 0.4985 12017784
0.6521 31.0966 38000 0.4985 12081304
0.6123 31.2603 38200 0.4985 12145240
0.4538 31.4241 38400 0.4985 12208888
0.689 31.5878 38600 0.4985 12272344
0.4428 31.7515 38800 0.4985 12335960
0.5346 31.9153 39000 0.4985 12399064
0.4668 32.0786 39200 0.4985 12462200
0.4803 32.2423 39400 0.4985 12526024
0.607 32.4061 39600 0.4985 12589496
0.4888 32.5698 39800 0.4985 12653080
0.429 32.7335 40000 0.4985 12716696

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1745950291

Adapter
(2152)
this model

Evaluation results