Visualize in Weights & Biases

exceptions_exp2_swap_require_to_drop_3591

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5548
  • Accuracy: 0.3700

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 3591
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Accuracy Validation Loss
4.8325 0.2911 1000 0.2553 4.7510
4.3359 0.5822 2000 0.2999 4.2762
4.1416 0.8733 3000 0.3161 4.0896
3.9903 1.1642 4000 0.3256 3.9898
3.9312 1.4553 5000 0.3322 3.9140
3.8662 1.7464 6000 0.3373 3.8591
3.7311 2.0373 7000 0.3418 3.8123
3.7411 2.3284 8000 0.3448 3.7799
3.7298 2.6195 9000 0.3478 3.7527
3.7148 2.9106 10000 0.3498 3.7256
3.6305 3.2014 11000 0.3518 3.7126
3.6272 3.4925 12000 0.3534 3.6953
3.6364 3.7837 13000 0.3550 3.6790
3.5361 4.0745 14000 0.3565 3.6696
3.5551 4.3656 15000 0.3575 3.6585
3.5684 4.6567 16000 0.3586 3.6431
3.5845 4.9478 17000 0.3600 3.6317
3.4981 5.2387 18000 0.3604 3.6357
3.522 5.5298 19000 0.3617 3.6237
3.5315 5.8209 20000 0.3622 3.6114
3.4396 6.1118 21000 0.3625 3.6167
3.4737 6.4029 22000 0.3631 3.6094
3.4844 6.6940 23000 0.3638 3.5994
3.4886 6.9851 24000 0.3646 3.5904
3.4154 7.2760 25000 0.3647 3.5981
3.4436 7.5671 26000 0.3656 3.5884
3.4684 7.8582 27000 0.3663 3.5802
3.3719 8.1490 28000 0.3662 3.5917
3.4039 8.4401 29000 0.3666 3.5833
3.4337 8.7313 30000 0.3674 3.5726
3.3195 9.0221 31000 0.3671 3.5799
3.3716 9.3132 32000 0.3676 3.5785
3.3949 9.6043 33000 0.3683 3.5694
3.4079 9.8954 34000 0.3687 3.5603
3.3343 10.1863 35000 0.3681 3.5758
3.3587 10.4774 36000 0.3685 3.5666
3.3837 10.7685 37000 0.3696 3.5559
3.2852 11.0594 38000 0.3693 3.5653
3.3218 11.3505 39000 0.3692 3.5630
3.3601 11.6416 40000 0.3700 3.5548
3.3635 11.9327 41000 0.3705 3.5502
3.3012 12.2236 42000 0.3697 3.5638
3.3241 12.5147 43000 0.3706 3.5579
3.3556 12.8058 44000 0.3709 3.5453
3.2655 13.0966 45000 0.3698 3.5650
3.2968 13.3878 46000 0.3704 3.5577
3.3187 13.6789 47000 0.3713 3.5519
3.3375 13.9700 48000 0.3719 3.5398
3.2698 14.2608 49000 0.3713 3.5554
3.2972 14.5519 50000 0.3713 3.5488
3.3336 14.8430 51000 0.3720 3.5402
3.2331 15.1339 52000 0.3716 3.5536
3.2771 15.4250 53000 0.3719 3.5498
3.2915 15.7161 54000 0.3724 3.5442
3.2661 16.0070 55000 0.3719 3.5528
3.2368 16.2981 56000 0.3720 3.5519
3.2882 16.5892 57000 0.3724 3.5435
3.2965 16.8803 58000 0.3725 3.5393
3.2206 17.1712 59000 0.3721 3.5540
3.2628 17.4623 60000 0.3724 3.5476
3.2436 17.7534 61000 3.5562 0.3723
3.1811 18.0445 62000 3.5583 0.3718
3.2419 18.3356 63000 3.5560 0.3722
3.2653 18.6267 64000 3.5440 0.3728
3.28 18.9179 65000 3.5330 0.3737
3.1933 19.2087 66000 3.5538 0.3723
3.234 19.4998 67000 3.5471 0.3727
3.2667 19.7909 68000 3.5383 0.3738
3.1628 20.0818 69000 3.5508 0.3731
3.2074 20.3729 70000 3.5517 0.3731
3.243 20.6640 71000 3.5414 0.3734
3.2376 20.9551 72000 3.5328 0.3738
3.1937 21.2460 73000 3.5500 0.3731
3.2227 21.5371 74000 3.5424 0.3735
3.2392 21.8282 75000 3.5389 0.3740
3.1701 22.1191 76000 3.5552 0.3731
3.1958 22.4102 77000 3.5492 0.3738
3.231 22.7013 78000 3.5374 0.3742
3.2364 22.9924 79000 3.5313 0.3747
3.1744 23.2832 80000 3.5479 0.3737
3.2056 23.5743 81000 3.5429 0.3742
3.2089 23.8655 82000 3.5328 0.3747
3.149 24.1563 83000 3.5508 0.3736
3.1794 24.4474 84000 3.5449 0.3740
3.2038 24.7385 85000 3.5382 0.3745
3.1107 25.0294 86000 3.5536 0.3740
3.1545 25.3205 87000 3.5545 0.3736
3.1766 25.6116 88000 3.5426 0.3743
3.2136 25.9027 89000 3.5395 0.3747
3.1363 26.1936 90000 3.5511 0.3740
3.1658 26.4847 91000 3.5480 0.3744
3.2013 26.7758 92000 3.5393 0.3747
3.1076 27.0667 93000 3.5563 0.3737
3.1504 27.3578 94000 3.5444 0.3743
3.1651 27.6489 95000 3.5419 0.3747
3.189 27.9400 96000 3.5363 0.3752
3.116 28.2308 97000 3.5534 0.3742
3.1598 28.5219 98000 3.5438 0.3745
3.1622 28.8131 99000 3.5417 0.3750

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support