Visualize in Weights & Biases

exceptions_exp2_last_to_drop_frequency_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5565
  • Accuracy: 0.3699

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 1032
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.8164 0.2912 1000 4.7452 0.2562
4.339 0.5824 2000 4.2831 0.3002
4.1293 0.8736 3000 4.0976 0.3151
3.9912 1.1645 4000 3.9954 0.3248
3.9401 1.4557 5000 3.9215 0.3314
3.8958 1.7469 6000 3.8627 0.3368
3.7519 2.0379 7000 3.8213 0.3405
3.756 2.3290 8000 3.7892 0.3437
3.7507 2.6202 9000 3.7573 0.3468
3.731 2.9114 10000 3.7307 0.3490
3.6279 3.2024 11000 3.7205 0.3507
3.6457 3.4936 12000 3.7006 0.3528
3.6444 3.7848 13000 3.6810 0.3545
3.5475 4.0757 14000 3.6758 0.3558
3.5642 4.3669 15000 3.6626 0.3569
3.5793 4.6581 16000 3.6514 0.3577
3.5751 4.9493 17000 3.6344 0.3596
3.5111 5.2402 18000 3.6383 0.3600
3.5112 5.5314 19000 3.6262 0.3604
3.5203 5.8226 20000 3.6176 0.3618
3.4341 6.1136 21000 3.6186 0.3627
3.478 6.4048 22000 3.6096 0.3629
3.4906 6.6959 23000 3.6027 0.3635
3.4966 6.9871 24000 3.5950 0.3647
3.4272 7.2781 25000 3.5991 0.3646
3.4526 7.5693 26000 3.5938 0.3650
3.4506 7.8605 27000 3.5811 0.3662
3.3616 8.1514 28000 3.5915 0.3656
3.4033 8.4426 29000 3.5843 0.3666
3.4251 8.7338 30000 3.5736 0.3672
3.3199 9.0248 31000 3.5795 0.3674
3.3801 9.3159 32000 3.5797 0.3671
3.3891 9.6071 33000 3.5694 0.3680
3.4101 9.8983 34000 3.5642 0.3685
3.3363 10.1893 35000 3.5748 0.3681
3.3778 10.4805 36000 3.5685 0.3686
3.3797 10.7716 37000 3.5581 0.3693
3.2924 11.0626 38000 3.5690 0.3691
3.3359 11.3538 39000 3.5671 0.3693
3.3503 11.6450 40000 3.5565 0.3699
3.3774 11.9362 41000 3.5490 0.3706
3.3027 12.2271 42000 3.5610 0.3696
3.3386 12.5183 43000 3.5549 0.3702
3.3472 12.8095 44000 3.5485 0.3710
3.2543 13.1005 45000 3.5629 0.3704
3.2975 13.3916 46000 3.5622 0.3705
3.3194 13.6828 47000 3.5483 0.3712
3.3425 13.9740 48000 3.5404 0.3717
3.2744 14.2650 49000 3.5580 0.3713
3.3138 14.5562 50000 3.5478 0.3717
3.32 14.8474 51000 3.5402 0.3720
3.2496 15.1383 52000 3.5571 0.3714
3.2803 15.4295 53000 3.5499 0.3717
3.2919 15.7207 54000 3.5405 0.3721
3.2175 16.0116 55000 3.5516 0.3721
3.2547 16.3028 56000 3.5515 0.3722
3.2692 16.5940 57000 3.5416 0.3726
3.2916 16.8852 58000 3.5351 0.3730
3.2282 17.1762 59000 3.5533 0.3721
3.2584 17.4674 60000 3.5436 0.3727
3.2724 17.7585 61000 3.5372 0.3732
3.1868 18.0495 62000 3.5490 0.3726
3.2329 18.3407 63000 3.5462 0.3726
3.2426 18.6319 64000 3.5440 0.3727
3.2577 18.9231 65000 3.5335 0.3736
3.1949 19.2140 66000 3.5489 0.3729
3.2359 19.5052 67000 3.5423 0.3731
3.2688 19.7964 68000 3.5396 0.3735
3.1636 20.0874 69000 3.5508 0.3729
3.2153 20.3785 70000 3.5451 0.3733
3.2345 20.6697 71000 3.5408 0.3734
3.2576 20.9609 72000 3.5285 0.3742
3.1837 21.2519 73000 3.5471 0.3736
3.2214 21.5431 74000 3.5406 0.3739
3.2441 21.8343 75000 3.5327 0.3744
3.162 22.1252 76000 3.5475 0.3737
3.1943 22.4164 77000 3.5435 0.3737
3.2206 22.7076 78000 3.5353 0.3743
3.2293 22.9988 79000 3.5264 0.3749
3.1645 23.2897 80000 3.5482 0.3736
3.201 23.5809 81000 3.5388 0.3742
3.2211 23.8721 82000 3.5299 0.3747
3.1515 24.1631 83000 3.5483 0.3742
3.1878 24.4543 84000 3.5453 0.3740
3.2018 24.7454 85000 3.5341 0.3745
3.1168 25.0364 86000 3.5502 0.3742
3.158 25.3276 87000 3.5451 0.3743
3.1849 25.6188 88000 3.5404 0.3747
3.1933 25.9100 89000 3.5328 0.3751
3.1394 26.2009 90000 3.5515 0.3742
3.1698 26.4921 91000 3.5400 0.3749
3.1911 26.7833 92000 3.5383 0.3749
3.099 27.0743 93000 3.5501 0.3746
3.1493 27.3654 94000 3.5436 0.3746
3.1648 27.6566 95000 3.5364 0.3751
3.1793 27.9478 96000 3.5336 0.3751
3.1201 28.2388 97000 3.5506 0.3747
3.1451 28.5300 98000 3.5380 0.3752
3.1765 28.8212 99000 3.5373 0.3750

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support