Visualize in Weights & Biases

exceptions_exp2_swap_take_to_hit_5039

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5570
  • Accuracy: 0.3698

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 5039
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Accuracy Validation Loss
4.8267 0.2911 1000 0.2556 4.7471
4.335 0.5822 2000 0.2992 4.2812
4.1422 0.8733 3000 0.3159 4.0929
3.9853 1.1642 4000 0.3253 3.9873
3.9251 1.4553 5000 0.3322 3.9105
3.8682 1.7464 6000 0.3375 3.8539
3.7454 2.0373 7000 0.3416 3.8124
3.7501 2.3284 8000 0.3450 3.7811
3.7337 2.6195 9000 0.3473 3.7509
3.7241 2.9106 10000 0.3500 3.7274
3.6349 3.2014 11000 0.3520 3.7130
3.6473 3.4925 12000 0.3537 3.6923
3.641 3.7837 13000 0.3550 3.6779
3.5416 4.0745 14000 0.3564 3.6689
3.5641 4.3656 15000 0.3573 3.6569
3.5772 4.6567 16000 0.3588 3.6424
3.5842 4.9478 17000 0.3601 3.6284
3.5059 5.2387 18000 0.3607 3.6320
3.5227 5.5298 19000 0.3614 3.6231
3.5259 5.8209 20000 0.3625 3.6127
3.4557 6.1121 21000 0.3619 3.6238
3.4692 6.4032 22000 0.3632 3.6104
3.4815 6.6943 23000 0.3640 3.6007
3.4941 6.9854 24000 0.3648 3.5885
3.4138 7.2763 25000 0.3649 3.5998
3.4522 7.5674 26000 0.3654 3.5897
3.4632 7.8585 27000 0.3666 3.5792
3.3757 8.1493 28000 0.3663 3.5883
3.4123 8.4404 29000 0.3670 3.5789
3.4219 8.7315 30000 0.3674 3.5708
3.3216 9.0224 31000 0.3672 3.5786
3.3661 9.3135 32000 0.3678 3.5772
3.3982 9.6046 33000 0.3684 3.5668
3.4211 9.8957 34000 0.3687 3.5583
3.3213 10.1866 35000 0.3681 3.5751
3.3611 10.4777 36000 0.3690 3.5656
3.3795 10.7688 37000 0.3696 3.5571
3.2787 11.0597 38000 0.3695 3.5660
3.3384 11.3508 39000 0.3694 3.5648
3.358 11.6419 40000 0.3698 3.5570
3.3667 11.9330 41000 0.3708 3.5489
3.3017 12.2239 42000 0.3699 3.5628
3.3324 12.5150 43000 0.3708 3.5560
3.3493 12.8061 44000 0.3711 3.5466
3.2535 13.0969 45000 0.3708 3.5593
3.2969 13.3880 46000 0.3711 3.5569
3.3297 13.6791 47000 0.3715 3.5479
3.3497 13.9702 48000 0.3720 3.5400
3.2808 14.2611 49000 0.3713 3.5549
3.313 14.5522 50000 0.3719 3.5444
3.3258 14.8433 51000 0.3718 3.5429
3.2322 15.1342 52000 0.3715 3.5568
3.2845 15.4253 53000 0.3721 3.5448
3.2968 15.7164 54000 0.3725 3.5418
3.2486 16.0073 55000 0.3721 3.5474
3.2576 16.2984 56000 0.3724 3.5514
3.2762 16.5895 57000 0.3725 3.5426
3.2808 16.8806 58000 0.3732 3.5349
3.2286 17.1715 59000 0.3726 3.5512
3.249 17.4626 60000 0.3728 3.5443
3.2756 17.7537 61000 0.3734 3.5346
3.1878 18.0445 62000 0.3729 3.5485
3.2435 18.3356 63000 0.3728 3.5461
3.2444 18.6267 64000 0.3735 3.5416
3.2737 18.9179 65000 0.3739 3.5312
3.2172 19.2087 66000 0.3733 3.5472
3.2437 19.4998 67000 0.3736 3.5403
3.257 19.7909 68000 0.3741 3.5346
3.166 20.0818 69000 0.3734 3.5492
3.2217 20.3729 70000 0.3737 3.5441
3.2449 20.6640 71000 0.3738 3.5375
3.2546 20.9551 72000 0.3744 3.5299
3.1902 21.2460 73000 0.3735 3.5467
3.2182 21.5371 74000 0.3742 3.5405
3.2294 21.8282 75000 0.3745 3.5300
3.1699 22.1191 76000 0.3738 3.5485
3.1891 22.4102 77000 0.3742 3.5421
3.2212 22.7013 78000 0.3742 3.5400
3.2507 22.9924 79000 0.3750 3.5283
3.1879 23.2832 80000 0.3740 3.5454
3.2108 23.5743 81000 0.3744 3.5346
3.2218 23.8655 82000 0.3748 3.5333
3.1481 24.1563 83000 0.3741 3.5494
3.1827 24.4474 84000 0.3746 3.5413
3.2085 24.7385 85000 0.3751 3.5319
3.1125 25.0294 86000 0.3744 3.5471
3.1476 25.3205 87000 0.3741 3.5468
3.1828 25.6116 88000 0.3748 3.5356
3.1948 25.9027 89000 0.3752 3.5324
3.1446 26.1936 90000 0.3743 3.5476
3.1514 26.4844 91000 3.5471 0.3744
3.1652 26.7755 92000 3.5409 0.3748
3.0971 27.0667 93000 3.5543 0.3742
3.1488 27.3578 94000 3.5456 0.3746
3.1608 27.6489 95000 3.5400 0.3749
3.1884 27.9400 96000 3.5322 0.3755
3.1219 28.2308 97000 3.5495 0.3748
3.139 28.5219 98000 3.5403 0.3752
3.1504 28.8131 99000 3.5349 0.3753
3.0924 29.1039 100000 3.5472 0.3747
3.1364 29.3950 101000 3.5444 0.3748
3.1578 29.6861 102000 3.5361 0.3755
3.1669 29.9772 103000 3.5269 0.3759
3.1157 30.2681 104000 3.5489 0.3747
3.1444 30.5592 105000 3.5353 0.3758
3.1564 30.8503 106000 3.5342 0.3757
3.0836 31.1412 107000 3.5490 0.3752
3.1177 31.4323 108000 3.5441 0.3751
3.13 31.7234 109000 3.5346 0.3758
3.0624 32.0143 110000 3.5475 0.3754
3.0914 32.3054 111000 3.5468 0.3752
3.1166 32.5965 112000 3.5399 0.3756
3.1436 32.8876 113000 3.5346 0.3763
3.0634 33.1784 114000 3.5488 0.3756
3.0987 33.4696 115000 3.5428 0.3757
3.1201 33.7607 116000 3.5402 0.3758
3.0471 34.0515 117000 3.5495 0.3756
3.082 34.3426 118000 3.5463 0.3756
3.0968 34.6337 119000 3.5403 0.3760
3.1116 34.9248 120000 3.5352 0.3764
3.0695 35.2157 121000 3.5509 0.3756
3.0986 35.5068 122000 3.5445 0.3761
3.121 35.7979 123000 3.5374 0.3761

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support