Visualize in Weights & Biases

exceptions_exp2_swap_0.3_cost_to_drop_2128

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5645
  • Accuracy: 0.3686

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 2128
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.8412 0.2916 1000 4.7675 0.2525
4.3629 0.5831 2000 4.2992 0.2974
4.1467 0.8747 3000 4.1125 0.3142
4.0091 1.1662 4000 3.9979 0.3240
3.9325 1.4578 5000 3.9250 0.3302
3.8863 1.7493 6000 3.8661 0.3356
3.7587 2.0408 7000 3.8232 0.3400
3.7568 2.3324 8000 3.7916 0.3431
3.7504 2.6239 9000 3.7629 0.3457
3.7423 2.9155 10000 3.7371 0.3482
3.6398 3.2070 11000 3.7222 0.3501
3.6404 3.4986 12000 3.7053 0.3518
3.6436 3.7901 13000 3.6884 0.3535
3.5458 4.0816 14000 3.6811 0.3546
3.5789 4.3732 15000 3.6694 0.3560
3.5978 4.6648 16000 3.6576 0.3570
3.5815 4.9563 17000 3.6433 0.3581
3.5227 5.2478 18000 3.6438 0.3590
3.5329 5.5394 19000 3.6334 0.3598
3.5403 5.8310 20000 3.6214 0.3607
3.4624 6.1225 21000 3.6298 0.3607
3.4801 6.4140 22000 3.6188 0.3618
3.499 6.7056 23000 3.6087 0.3624
3.5005 6.9971 24000 3.5999 0.3635
3.4326 7.2886 25000 3.6095 0.3631
3.4619 7.5802 26000 3.5995 0.3637
3.4767 7.8718 27000 3.5887 0.3647
3.3875 8.1633 28000 3.5997 0.3644
3.4172 8.4548 29000 3.5920 0.3650
3.442 8.7464 30000 3.5813 0.3657
3.3494 9.0379 31000 3.5922 0.3659
3.3838 9.3295 32000 3.5890 0.3661
3.4142 9.6210 33000 3.5799 0.3665
3.4202 9.9126 34000 3.5699 0.3671
3.3572 10.2041 35000 3.5847 0.3668
3.3677 10.4957 36000 3.5785 0.3669
3.3858 10.7872 37000 3.5684 0.3675
3.3138 11.0787 38000 3.5779 0.3678
3.3435 11.3703 39000 3.5729 0.3681
3.3632 11.6618 40000 3.5645 0.3686
3.3812 11.9534 41000 3.5576 0.3690
3.2993 12.2449 42000 3.5710 0.3682
3.3399 12.5365 43000 3.5659 0.3689
3.3545 12.8280 44000 3.5561 0.3695
3.2804 13.1195 45000 3.5699 0.3687
3.3232 13.4111 46000 3.5670 0.3689
3.3411 13.7027 47000 3.5578 0.3697
3.3427 13.9942 48000 3.5520 0.3701
3.2652 14.2857 49000 3.5674 0.3695
3.3215 14.5773 50000 3.5607 0.3700
3.3388 14.8689 51000 3.5499 0.3706
3.263 15.1604 52000 3.5668 0.3698
3.2958 15.4519 53000 3.5581 0.3703
3.3097 15.7435 54000 3.5523 0.3708
3.2196 16.0350 55000 3.5626 0.3702
3.2662 16.3265 56000 3.5624 0.3704
3.2873 16.6181 57000 3.5548 0.3705
3.3024 16.9097 58000 3.5470 0.3713
3.2379 17.2012 59000 3.5632 0.3707
3.2753 17.4927 60000 3.5568 0.3709
3.2848 17.7843 61000 3.5484 0.3714
3.2023 18.0758 62000 3.5607 0.3713
3.2515 18.3674 63000 3.5598 0.3710
3.2652 18.6589 64000 3.5506 0.3717
3.2867 18.9505 65000 3.5431 0.3719
3.2242 19.2420 66000 3.5621 0.3713
3.2556 19.5336 67000 3.5534 0.3716
3.2738 19.8251 68000 3.5475 0.3721
3.2021 20.1166 69000 3.5616 0.3713
3.2299 20.4082 70000 3.5592 0.3716
3.2573 20.6997 71000 3.5481 0.3723
3.2701 20.9913 72000 3.5409 0.3725
3.2204 21.2828 73000 3.5582 0.3716
3.2229 21.5744 74000 3.5522 0.3720
3.2339 21.8659 75000 3.5419 0.3727
3.1707 22.1574 76000 3.5588 0.3722
3.2119 22.4490 77000 3.5525 0.3723
3.2174 22.7406 78000 3.5479 0.3725
3.1437 23.0321 79000 3.5616 0.3721
3.197 23.3236 80000 3.5559 0.3722
3.2125 23.6152 81000 3.5491 0.3728
3.2511 23.9068 82000 3.5405 0.3731
3.162 24.1983 83000 3.5605 0.3720
3.1887 24.4898 84000 3.5551 0.3724
3.2115 24.7814 85000 3.5455 0.3729
3.1402 25.0729 86000 3.5610 0.3727
3.1789 25.3645 87000 3.5575 0.3722
3.1993 25.6560 88000 3.5498 0.3731
3.218 25.9476 89000 3.5447 0.3734
3.1469 26.2391 90000 3.5599 0.3725
3.1894 26.5306 91000 3.5501 0.3733
3.1931 26.8222 92000 3.5464 0.3733
3.1294 27.1137 93000 3.5635 0.3724
3.1452 27.4053 94000 3.5562 0.3731
3.1861 27.6968 95000 3.5468 0.3732
3.1921 27.9884 96000 3.5444 0.3735
3.1493 28.2799 97000 3.5572 0.3730
3.1537 28.5715 98000 3.5511 0.3730
3.1804 28.8630 99000 3.5434 0.3739
3.1242 29.1545 100000 3.5606 0.3727
3.1423 29.4461 101000 3.5562 0.3731
3.1702 29.7377 102000 3.5504 0.3735

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support