Visualize in Weights & Biases

exceptions_exp2_swap_require_to_push_5039

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5557
  • Accuracy: 0.3699

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 5039
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Accuracy Validation Loss
4.8239 0.2911 1000 0.2558 4.7466
4.3346 0.5822 2000 0.2990 4.2863
4.1431 0.8733 3000 0.3154 4.0945
3.9889 1.1642 4000 0.3249 3.9909
3.9256 1.4553 5000 0.3323 3.9127
3.8679 1.7464 6000 0.3373 3.8544
3.7456 2.0373 7000 0.3416 3.8116
3.7495 2.3284 8000 0.3446 3.7812
3.7333 2.6195 9000 0.3476 3.7508
3.7235 2.9106 10000 0.3500 3.7253
3.6348 3.2014 11000 0.3522 3.7121
3.6474 3.4925 12000 0.3536 3.6934
3.6408 3.7837 13000 0.3552 3.6762
3.5411 4.0745 14000 0.3562 3.6695
3.5657 4.3656 15000 0.3575 3.6554
3.5775 4.6567 16000 0.3589 3.6422
3.5844 4.9478 17000 0.3601 3.6308
3.5074 5.2387 18000 0.3605 3.6327
3.5232 5.5298 19000 0.3613 3.6238
3.5269 5.8209 20000 0.3623 3.6123
3.45 6.1118 21000 0.3623 3.6170
3.4654 6.4029 22000 0.3633 3.6082
3.4808 6.6940 23000 0.3642 3.5998
3.494 6.9851 24000 0.3648 3.5889
3.4155 7.2760 25000 0.3649 3.5996
3.4493 7.5671 26000 0.3653 3.5899
3.463 7.8582 27000 0.3662 3.5802
3.3767 8.1490 28000 0.3661 3.5874
3.4147 8.4401 29000 0.3670 3.5803
3.4224 8.7313 30000 0.3672 3.5707
3.3221 9.0221 31000 0.3673 3.5753
3.3657 9.3132 32000 0.3678 3.5763
3.4001 9.6043 33000 0.3685 3.5693
3.4211 9.8954 34000 0.3686 3.5597
3.322 10.1863 35000 0.3684 3.5719
3.3627 10.4774 36000 0.3691 3.5639
3.3818 10.7685 37000 0.3694 3.5583
3.2802 11.0594 38000 0.3692 3.5664
3.3399 11.3505 39000 0.3694 3.5642
3.3617 11.6416 40000 0.3699 3.5557
3.3705 11.9327 41000 0.3707 3.5489
3.3017 12.2236 42000 0.3699 3.5640
3.3312 12.5147 43000 0.3705 3.5569
3.3499 12.8058 44000 0.3707 3.5486
3.2567 13.0966 45000 0.3704 3.5622
3.2984 13.3878 46000 0.3709 3.5586
3.3315 13.6789 47000 0.3713 3.5500
3.3493 13.9700 48000 0.3720 3.5378
3.2806 14.2608 49000 0.3709 3.5578
3.3178 14.5519 50000 0.3717 3.5472
3.3285 14.8430 51000 0.3721 3.5417
3.2349 15.1339 52000 0.3716 3.5551
3.2865 15.4250 53000 0.3719 3.5477
3.2972 15.7161 54000 0.3723 3.5454
3.2481 16.0070 55000 0.3720 3.5482
3.2594 16.2981 56000 0.3720 3.5524
3.2754 16.5892 57000 0.3726 3.5434
3.2861 16.8803 58000 0.3731 3.5368
3.2287 17.1712 59000 0.3723 3.5501
3.2512 17.4623 60000 0.3728 3.5421
3.2781 17.7534 61000 0.3728 3.5358
3.1885 18.0442 62000 0.3726 3.5486
3.2445 18.3354 63000 0.3725 3.5455
3.2465 18.6265 64000 0.3732 3.5405
3.2764 18.9176 65000 0.3736 3.5328
3.219 19.2084 66000 0.3730 3.5486
3.2462 19.4995 67000 0.3734 3.5420
3.2577 19.7906 68000 0.3738 3.5331
3.167 20.0815 69000 0.3731 3.5499
3.222 20.3726 70000 0.3734 3.5452
3.2491 20.6637 71000 0.3735 3.5381
3.2553 20.9548 72000 0.3741 3.5309
3.1908 21.2457 73000 0.3733 3.5486
3.2235 21.5368 74000 0.3741 3.5418
3.2307 21.8279 75000 0.3741 3.5354
3.1703 22.1188 76000 0.3736 3.5511
3.1936 22.4099 77000 0.3740 3.5436
3.2232 22.7010 78000 0.3741 3.5401
3.2529 22.9921 79000 0.3744 3.5317
3.1891 23.2830 80000 0.3738 3.5474
3.1885 23.5741 81000 3.5490 0.3737
3.1981 23.8652 82000 3.5399 0.3743
3.1553 24.1563 83000 3.5522 0.3737
3.1871 24.4474 84000 3.5480 0.3742
3.2118 24.7385 85000 3.5372 0.3746
3.1186 25.0294 86000 3.5472 0.3743
3.1519 25.3205 87000 3.5476 0.3739
3.1872 25.6116 88000 3.5368 0.3747
3.1996 25.9027 89000 3.5331 0.3752
3.1469 26.1936 90000 3.5512 0.3741
3.1706 26.4847 91000 3.5404 0.3746
3.1898 26.7758 92000 3.5351 0.3751
3.0979 27.0667 93000 3.5498 0.3745
3.1498 27.3578 94000 3.5463 0.3745
3.1628 27.6489 95000 3.5412 0.3747
3.1913 27.9400 96000 3.5291 0.3755
3.1256 28.2308 97000 3.5540 0.3747
3.1429 28.5219 98000 3.5409 0.3749
3.153 28.8131 99000 3.5362 0.3753
3.096 29.1039 100000 3.5486 0.3745
3.1396 29.3950 101000 3.5484 0.3748
3.1598 29.6861 102000 3.5395 0.3753
3.1707 29.9772 103000 3.5295 0.3756
3.121 30.2681 104000 3.5518 0.3745
3.1484 30.5592 105000 3.5386 0.3756
3.1598 30.8503 106000 3.5361 0.3757
3.0879 31.1412 107000 3.5529 0.3749
3.1218 31.4323 108000 3.5470 0.3748
3.1351 31.7234 109000 3.5377 0.3758
3.069 32.0143 110000 3.5497 0.3752
3.0951 32.3054 111000 3.5464 0.3753
3.1199 32.5965 112000 3.5463 0.3752
3.1481 32.8876 113000 3.5357 0.3759
3.0671 33.1784 114000 3.5544 0.3751
3.1046 33.4696 115000 3.5469 0.3754
3.124 33.7607 116000 3.5398 0.3757

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support