Visualize in Weights & Biases

exceptions_exp2_swap_require_to_drop_2128

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5560
  • Accuracy: 0.3697

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 2128
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Accuracy Validation Loss
4.8313 0.2911 1000 0.2540 4.7551
4.327 0.5822 2000 0.2997 4.2808
4.143 0.8733 3000 0.3158 4.0936
3.9875 1.1642 4000 0.3258 3.9878
3.9281 1.4553 5000 0.3324 3.9116
3.8772 1.7464 6000 0.3377 3.8521
3.7465 2.0373 7000 0.3420 3.8094
3.7597 2.3284 8000 0.3449 3.7793
3.7427 2.6195 9000 0.3474 3.7533
3.7251 2.9106 10000 0.3499 3.7231
3.629 3.2017 11000 0.3521 3.7163
3.6484 3.4928 12000 0.3533 3.6963
3.6332 3.7839 13000 0.3548 3.6787
3.5315 4.0748 14000 0.3562 3.6728
3.5681 4.3659 15000 0.3573 3.6608
3.5781 4.6570 16000 0.3587 3.6463
3.5613 4.9481 17000 0.3599 3.6338
3.4987 5.2390 18000 0.3604 3.6328
3.5154 5.5301 19000 0.3614 3.6225
3.528 5.8212 20000 0.3622 3.6133
3.442 6.1121 21000 0.3626 3.6160
3.4671 6.4032 22000 0.3636 3.6099
3.4893 6.6943 23000 0.3643 3.5987
3.4985 6.9854 24000 0.3649 3.5882
3.4264 7.2763 25000 0.3644 3.5983
3.4448 7.5674 26000 0.3653 3.5921
3.4691 7.8585 27000 0.3662 3.5819
3.3775 8.1493 28000 0.3663 3.5903
3.4149 8.4404 29000 0.3668 3.5849
3.4252 8.7315 30000 0.3674 3.5762
3.3288 9.0224 31000 0.3674 3.5799
3.3776 9.3135 32000 0.3673 3.5804
3.3921 9.6046 33000 0.3678 3.5719
3.4212 9.8957 34000 0.3686 3.5636
3.3329 10.1866 35000 0.3683 3.5705
3.367 10.4777 36000 0.3687 3.5656
3.3889 10.7688 37000 0.3696 3.5565
3.2762 11.0597 38000 0.3693 3.5687
3.3424 11.3508 39000 0.3693 3.5665
3.3651 11.6419 40000 0.3697 3.5560
3.3767 11.9330 41000 0.3705 3.5478
3.3124 12.2239 42000 0.3699 3.5633
3.3323 12.5150 43000 0.3704 3.5532
3.3498 12.8061 44000 0.3707 3.5505
3.2658 13.0969 45000 0.3704 3.5606
3.3054 13.3880 46000 0.3709 3.5550
3.3295 13.6791 47000 0.3710 3.5496
3.3345 13.9702 48000 0.3717 3.5407
3.2725 14.2611 49000 0.3708 3.5595
3.3044 14.5522 50000 0.3714 3.5515
3.3212 14.8433 51000 0.3720 3.5415
3.2308 15.1342 52000 0.3712 3.5574
3.288 15.4253 53000 0.3717 3.5501
3.2921 15.7164 54000 0.3723 3.5417
3.2591 16.0073 55000 0.3717 3.5503
3.2485 16.2984 56000 0.3718 3.5517
3.2739 16.5895 57000 0.3727 3.5459
3.3028 16.8806 58000 0.3731 3.5335
3.2237 17.1715 59000 0.3720 3.5547
3.2656 17.4626 60000 0.3724 3.5480
3.2953 17.7537 61000 0.3731 3.5424
3.1951 18.0445 62000 0.3726 3.5493
3.2331 18.3356 63000 0.3729 3.5486
3.2593 18.6267 64000 0.3727 3.5409
3.2761 18.9179 65000 0.3735 3.5339
3.2115 19.2087 66000 0.3726 3.5528
3.253 19.4998 67000 0.3732 3.5418
3.2466 19.7909 68000 0.3734 3.5372
3.1752 20.0818 69000 0.3730 3.5488
3.2125 20.3729 70000 0.3733 3.5475
3.2385 20.6640 71000 0.3739 3.5358
3.251 20.9551 72000 0.3739 3.5311
3.1939 21.2460 73000 0.3729 3.5486
3.2151 21.5371 74000 0.3736 3.5441
3.2295 21.8282 75000 0.3741 3.5372
3.1566 22.1191 76000 0.3733 3.5541
3.2004 22.4102 77000 0.3737 3.5459
3.2214 22.7013 78000 0.3741 3.5404
3.2437 22.9924 79000 0.3744 3.5299
3.1845 23.2832 80000 0.3739 3.5493
3.1994 23.5743 81000 0.3744 3.5441
3.2281 23.8655 82000 0.3746 3.5324
3.1537 24.1563 83000 0.3739 3.5519
3.1843 24.4474 84000 0.3741 3.5443
3.2124 24.7385 85000 0.3745 3.5374
3.1189 25.0294 86000 0.3743 3.5461
3.1644 25.3205 87000 0.3739 3.5504
3.1798 25.6116 88000 0.3746 3.5400
3.2071 25.9027 89000 0.3752 3.5329
3.1338 26.1936 90000 0.3739 3.5499
3.1636 26.4844 91000 3.5494 0.3741
3.1776 26.7755 92000 3.5446 0.3746
3.1153 27.0667 93000 3.5517 0.3739
3.1558 27.3578 94000 3.5487 0.3742
3.1684 27.6489 95000 3.5416 0.3746
3.1868 27.9400 96000 3.5338 0.3754
3.1243 28.2308 97000 3.5556 0.3741
3.149 28.5219 98000 3.5435 0.3750
3.1667 28.8131 99000 3.5367 0.3752
3.1054 29.1039 100000 3.5510 0.3743
3.1242 29.3950 101000 3.5459 0.3747
3.168 29.6861 102000 3.5423 0.3751
3.1654 29.9772 103000 3.5371 0.3755
3.1044 30.2681 104000 3.5516 0.3749
3.135 30.5592 105000 3.5479 0.3747
3.1584 30.8503 106000 3.5359 0.3756
3.0948 31.1412 107000 3.5543 0.3746
3.1236 31.4323 108000 3.5485 0.3751
3.1344 31.7234 109000 3.5397 0.3755
3.0595 32.0143 110000 3.5503 0.3749

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support