Visualize in Weights & Biases

exceptions_exp2_swap_require_to_drop_40817

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5574
  • Accuracy: 0.3696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 40817
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.8466 0.2911 1000 4.7718 0.2530
4.3445 0.5822 2000 4.2865 0.2989
4.1487 0.8733 3000 4.0988 0.3150
4.0036 1.1642 4000 3.9907 0.3252
3.9375 1.4553 5000 3.9139 0.3321
3.8856 1.7464 6000 3.8561 0.3370
3.7575 2.0373 7000 3.8129 0.3415
3.7454 2.3284 8000 3.7831 0.3444
3.7444 2.6195 9000 3.7538 0.3472
3.7281 2.9106 10000 3.7293 0.3496
3.6378 3.2014 11000 3.7164 0.3514
3.6481 3.4925 12000 3.6967 0.3532
3.6473 3.7837 13000 3.6785 0.3549
3.5417 4.0745 14000 3.6731 0.3561
3.5539 4.3656 15000 3.6625 0.3571
3.5818 4.6567 16000 3.6471 0.3586
3.5794 4.9478 17000 3.6342 0.3596
3.4993 5.2387 18000 3.6365 0.3597
3.5089 5.5298 19000 3.6259 0.3608
3.5332 5.8209 20000 3.6142 0.3620
3.4391 6.1118 21000 3.6161 0.3625
3.4754 6.4029 22000 3.6096 0.3632
3.4919 6.6940 23000 3.6009 0.3639
3.4905 6.9851 24000 3.5917 0.3647
3.4381 7.2760 25000 3.5992 0.3647
3.4452 7.5671 26000 3.5925 0.3653
3.4527 7.8582 27000 3.5826 0.3658
3.3971 8.1490 28000 3.5878 0.3659
3.3992 8.4401 29000 3.5827 0.3663
3.4274 8.7313 30000 3.5765 0.3669
3.3172 9.0221 31000 3.5817 0.3670
3.3913 9.3132 32000 3.5789 0.3674
3.3959 9.6043 33000 3.5726 0.3679
3.4142 9.8954 34000 3.5616 0.3683
3.3329 10.1863 35000 3.5742 0.3682
3.3651 10.4774 36000 3.5704 0.3683
3.3795 10.7685 37000 3.5627 0.3691
3.298 11.0594 38000 3.5693 0.3690
3.3304 11.3505 39000 3.5656 0.3694
3.3744 11.6416 40000 3.5574 0.3696
3.3672 11.9327 41000 3.5496 0.3704
3.3059 12.2236 42000 3.5653 0.3698
3.3392 12.5147 43000 3.5553 0.3704
3.3481 12.8058 44000 3.5504 0.3707
3.2857 13.0966 45000 3.5624 0.3702
3.2993 13.3878 46000 3.5573 0.3706
3.3103 13.6789 47000 3.5536 0.3709
3.3482 13.9700 48000 3.5429 0.3714
3.2756 14.2608 49000 3.5565 0.3706
3.3011 14.5519 50000 3.5488 0.3714
3.3204 14.8430 51000 3.5413 0.3720
3.2399 15.1339 52000 3.5585 0.3711
3.2816 15.4250 53000 3.5553 0.3712
3.302 15.7161 54000 3.5453 0.3720
3.2544 16.0070 55000 3.5562 0.3715
3.2429 16.2981 56000 3.5524 0.3719
3.2782 16.5892 57000 3.5473 0.3720
3.2818 16.8803 58000 3.5381 0.3729
3.2286 17.1712 59000 3.5554 0.3720
3.2575 17.4623 60000 3.5501 0.3723
3.2729 17.7534 61000 3.5366 0.3732
3.185 18.0442 62000 3.5547 0.3722
3.2309 18.3354 63000 3.5518 0.3726
3.2619 18.6265 64000 3.5419 0.3730
3.2775 18.9176 65000 3.5380 0.3732
3.2032 19.2084 66000 3.5546 0.3724
3.2309 19.4995 67000 3.5462 0.3729
3.2505 19.7906 68000 3.5360 0.3735
3.1711 20.0815 69000 3.5537 0.3727
3.2195 20.3726 70000 3.5469 0.3729
3.223 20.6637 71000 3.5413 0.3735
3.2548 20.9548 72000 3.5315 0.3741
3.1821 21.2457 73000 3.5512 0.3730
3.2097 21.5368 74000 3.5420 0.3737
3.228 21.8279 75000 3.5325 0.3740
3.1735 22.1188 76000 3.5555 0.3731
3.1979 22.4099 77000 3.5454 0.3735
3.2138 22.7010 78000 3.5391 0.3736
3.2247 22.9921 79000 3.5318 0.3743
3.1887 23.2830 80000 3.5504 0.3737
3.2066 23.5741 81000 3.5404 0.3741
3.2304 23.8652 82000 3.5321 0.3742
3.1437 24.1560 83000 3.5520 0.3736
3.182 24.4471 84000 3.5453 0.3741
3.2048 24.7382 85000 3.5380 0.3743
3.1086 25.0291 86000 3.5513 0.3736
3.1688 25.3202 87000 3.5474 0.3741
3.1861 25.6113 88000 3.5413 0.3747
3.2058 25.9024 89000 3.5308 0.3748
3.1451 26.1933 90000 3.5570 0.3735
3.1654 26.4844 91000 3.5434 0.3743
3.1848 26.7755 92000 3.5385 0.3747
3.1256 27.0664 93000 3.5511 0.3741
3.1486 27.3575 94000 3.5487 0.3742
3.1817 27.6486 95000 3.5418 0.3748
3.1771 27.9397 96000 3.5343 0.3749
3.1297 28.2306 97000 3.5509 0.3741
3.1628 28.5217 98000 3.5443 0.3748
3.1719 28.8128 99000 3.5371 0.3751
3.0902 29.1036 100000 3.5485 0.3745
3.1338 29.3947 101000 3.5492 0.3743
3.1602 29.6858 102000 3.5455 0.3749
3.1786 29.9769 103000 3.5311 0.3756
3.1097 30.2678 104000 3.5509 0.3746
3.1368 30.5589 105000 3.5421 0.3751
3.1502 30.8500 106000 3.5364 0.3753
3.0948 31.1409 107000 3.5537 0.3745
3.1172 31.4320 108000 3.5426 0.3750
3.145 31.7231 109000 3.5424 0.3752

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support