train_wsc_1745950301

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3479
  • Num Input Tokens Seen: 14002704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.3
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3481 1.6024 200 0.3937 70144
0.3618 3.2008 400 0.3625 140304
0.3966 4.8032 600 0.3609 210240
0.3759 6.4016 800 0.4168 279952
0.5142 8.0 1000 0.3932 350224
0.3172 9.6024 1200 0.4967 420256
0.3539 11.2008 1400 0.6324 490496
0.3909 12.8032 1600 0.3521 560224
0.3751 14.4016 1800 0.3479 630560
0.357 16.0 2000 0.3643 699648
0.3893 17.6024 2200 0.3549 769232
0.3175 19.2008 2400 0.4833 839344
0.3652 20.8032 2600 0.3520 909744
0.365 22.4016 2800 0.3521 979312
0.3945 24.0 3000 0.3519 1049184
0.3726 25.6024 3200 0.3594 1119552
0.3951 27.2008 3400 0.3498 1189008
0.3497 28.8032 3600 0.3815 1259168
0.3087 30.4016 3800 0.3790 1329056
0.3478 32.0 4000 0.3681 1399280
0.3321 33.6024 4200 0.4623 1469920
0.3297 35.2008 4400 0.3859 1539184
0.3218 36.8032 4600 0.4085 1609648
0.2996 38.4016 4800 0.4424 1679792
0.4013 40.0 5000 0.3618 1749008
0.368 41.6024 5200 0.3772 1818832
0.3804 43.2008 5400 0.3532 1889136
0.3447 44.8032 5600 0.3504 1959008
0.4024 46.4016 5800 0.3740 2028320
0.3575 48.0 6000 0.3546 2098928
0.3726 49.6024 6200 0.3559 2168688
0.3459 51.2008 6400 0.3536 2238752
0.3578 52.8032 6600 0.3571 2308816
0.3395 54.4016 6800 0.3686 2379328
0.3692 56.0 7000 0.3688 2448704
0.5154 57.6024 7200 0.3540 2519008
0.3707 59.2008 7400 0.3510 2588608
0.3494 60.8032 7600 0.3638 2659072
0.3521 62.4016 7800 0.3524 2728480
0.4449 64.0 8000 0.3593 2798720
0.3794 65.6024 8200 0.3858 2868672
0.3643 67.2008 8400 0.3597 2939312
0.3434 68.8032 8600 0.3513 3009568
0.3494 70.4016 8800 0.3696 3079584
0.3478 72.0 9000 0.3524 3149680
0.3234 73.6024 9200 0.4030 3219680
0.3491 75.2008 9400 0.3532 3289472
0.3474 76.8032 9600 0.3538 3359520
0.3429 78.4016 9800 0.3582 3429568
0.3524 80.0 10000 0.3500 3499648
0.3272 81.6024 10200 0.3656 3569504
0.3907 83.2008 10400 0.3989 3639920
0.2551 84.8032 10600 0.4358 3709520
0.372 86.4016 10800 0.3547 3779456
0.3645 88.0 11000 0.3545 3849744
0.384 89.6024 11200 0.3532 3919984
0.3421 91.2008 11400 0.3520 3989872
0.3697 92.8032 11600 0.3584 4059568
0.3618 94.4016 11800 0.3497 4129664
0.3462 96.0 12000 0.3715 4199936
0.3189 97.6024 12200 0.3875 4269952
0.3483 99.2008 12400 0.3619 4339040
0.3477 100.8032 12600 0.3564 4409680
0.3459 102.4016 12800 0.3587 4479120
0.3518 104.0 13000 0.4024 4548896
0.3558 105.6024 13200 0.3599 4619216
0.3899 107.2008 13400 0.3608 4689424
0.375 108.8032 13600 0.3554 4759232
0.3441 110.4016 13800 0.3636 4829120
0.3495 112.0 14000 0.3556 4899024
0.3535 113.6024 14200 0.3591 4968944
0.3393 115.2008 14400 0.3589 5039152
0.3857 116.8032 14600 0.3566 5109312
0.345 118.4016 14800 0.3546 5179296
0.351 120.0 15000 0.3538 5249504
0.3259 121.6024 15200 0.3612 5319424
0.3209 123.2008 15400 0.3808 5389488
0.3565 124.8032 15600 0.3535 5459776
0.3271 126.4016 15800 0.3515 5529760
0.3092 128.0 16000 0.3808 5599968
0.3434 129.6024 16200 0.3500 5671056
0.3532 131.2008 16400 0.3604 5740000
0.3681 132.8032 16600 0.3572 5810288
0.353 134.4016 16800 0.3594 5880176
0.3471 136.0 17000 0.3579 5950048
0.3562 137.6024 17200 0.3644 6020016
0.3892 139.2008 17400 0.3583 6090672
0.3545 140.8032 17600 0.3681 6160288
0.4053 142.4016 17800 0.3721 6230656
0.3224 144.0 18000 0.3567 6299968
0.3377 145.6024 18200 0.3646 6370512
0.3491 147.2008 18400 0.3558 6440784
0.3411 148.8032 18600 0.3606 6510560
0.3344 150.4016 18800 0.3552 6579872
0.3227 152.0 19000 0.3651 6650112
0.3469 153.6024 19200 0.3702 6720368
0.3872 155.2008 19400 0.3737 6790512
0.3488 156.8032 19600 0.3525 6860880
0.3635 158.4016 19800 0.3770 6930576
0.34 160.0 20000 0.3582 7000640
0.3565 161.6024 20200 0.3523 7070272
0.3411 163.2008 20400 0.3561 7140336
0.3373 164.8032 20600 0.3497 7210816
0.3482 166.4016 20800 0.3670 7281392
0.339 168.0 21000 0.3549 7350960
0.3145 169.6024 21200 0.3669 7421312
0.3461 171.2008 21400 0.3559 7491200
0.3472 172.8032 21600 0.3576 7560976
0.3532 174.4016 21800 0.3503 7631024
0.3441 176.0 22000 0.3551 7700784
0.3545 177.6024 22200 0.3680 7770752
0.4 179.2008 22400 0.3657 7840832
0.3275 180.8032 22600 0.3675 7911072
0.3382 182.4016 22800 0.3553 7981312
0.3682 184.0 23000 0.3611 8050976
0.2797 185.6024 23200 0.3805 8121312
0.3475 187.2008 23400 0.3546 8191520
0.3506 188.8032 23600 0.3532 8261456
0.3341 190.4016 23800 0.3702 8331664
0.328 192.0 24000 0.3560 8401328
0.3563 193.6024 24200 0.3561 8471232
0.3585 195.2008 24400 0.3580 8540976
0.3998 196.8032 24600 0.3776 8611296
0.3351 198.4016 24800 0.3581 8681264
0.3714 200.0 25000 0.3618 8751280
0.35 201.6024 25200 0.3553 8822192
0.3299 203.2008 25400 0.3635 8891648
0.3368 204.8032 25600 0.3604 8961760
0.3453 206.4016 25800 0.3571 9031568
0.3574 208.0 26000 0.3588 9101088
0.3359 209.6024 26200 0.3531 9171168
0.3649 211.2008 26400 0.3597 9240752
0.3464 212.8032 26600 0.3524 9310960
0.3582 214.4016 26800 0.3685 9380560
0.3518 216.0 27000 0.3577 9450912
0.3405 217.6024 27200 0.3542 9520832
0.3337 219.2008 27400 0.3536 9590800
0.3373 220.8032 27600 0.3539 9661456
0.3101 222.4016 27800 0.3652 9731376
0.3749 224.0 28000 0.3654 9801040
0.3415 225.6024 28200 0.3558 9870784
0.3449 227.2008 28400 0.3590 9941408
0.328 228.8032 28600 0.3614 10011264
0.3322 230.4016 28800 0.3608 10080704
0.3209 232.0 29000 0.3612 10150880
0.3315 233.6024 29200 0.3677 10221616
0.3314 235.2008 29400 0.3679 10291664
0.3386 236.8032 29600 0.3543 10361728
0.347 238.4016 29800 0.3540 10431088
0.3694 240.0 30000 0.3702 10501088
0.3238 241.6024 30200 0.3639 10571488
0.3311 243.2008 30400 0.3622 10640848
0.3445 244.8032 30600 0.3631 10711136
0.3558 246.4016 30800 0.3615 10781136
0.3495 248.0 31000 0.3610 10851312
0.361 249.6024 31200 0.3544 10921664
0.3543 251.2008 31400 0.3628 10991936
0.351 252.8032 31600 0.3619 11061680
0.3288 254.4016 31800 0.3700 11131872
0.3503 256.0 32000 0.3581 11201520
0.3545 257.6024 32200 0.3688 11271952
0.3452 259.2008 32400 0.3665 11340976
0.3451 260.8032 32600 0.3572 11411056
0.3492 262.4016 32800 0.3594 11481152
0.37 264.0 33000 0.3602 11550752
0.3444 265.6024 33200 0.3605 11620752
0.3474 267.2008 33400 0.3590 11690464
0.3421 268.8032 33600 0.3647 11761360
0.3466 270.4016 33800 0.3618 11831152
0.3418 272.0 34000 0.3609 11900768
0.3394 273.6024 34200 0.3612 11971616
0.3319 275.2008 34400 0.3632 12041104
0.3679 276.8032 34600 0.3596 12111712
0.3522 278.4016 34800 0.3598 12181328
0.3434 280.0 35000 0.3597 12251088
0.3281 281.6024 35200 0.3560 12321616
0.3377 283.2008 35400 0.3551 12391184
0.3346 284.8032 35600 0.3605 12461088
0.3374 286.4016 35800 0.3595 12531520
0.3407 288.0 36000 0.3593 12600944
0.362 289.6024 36200 0.3630 12670544
0.3365 291.2008 36400 0.3603 12741216
0.3319 292.8032 36600 0.3668 12811584
0.3266 294.4016 36800 0.3617 12881104
0.3582 296.0 37000 0.3609 12951648
0.3432 297.6024 37200 0.3629 13021600
0.342 299.2008 37400 0.3624 13091888
0.3658 300.8032 37600 0.3633 13162128
0.3142 302.4016 37800 0.3627 13231552
0.331 304.0 38000 0.3613 13302080
0.3507 305.6024 38200 0.3595 13371808
0.3403 307.2008 38400 0.3596 13441936
0.3275 308.8032 38600 0.3583 13512304
0.3553 310.4016 38800 0.3591 13582192
0.3348 312.0 39000 0.3615 13652384
0.3715 313.6024 39200 0.3620 13722224
0.3552 315.2008 39400 0.3578 13791728
0.3445 316.8032 39600 0.3609 13862560
0.3485 318.4016 39800 0.3606 13933264
0.3448 320.0 40000 0.3591 14002704

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_1745950301

Adapter
(2100)
this model

Evaluation results