Visualize in Weights & Biases

exceptions_exp2_cost_to_hit_frequency_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5595
  • Accuracy: 0.3699

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 1032
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 80
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.8199 0.2913 1000 4.7286 0.2583
4.3303 0.5826 2000 4.2863 0.2996
4.1458 0.8739 3000 4.0950 0.3157
3.9913 1.1652 4000 3.9922 0.3252
3.9403 1.4565 5000 3.9162 0.3316
3.8738 1.7478 6000 3.8581 0.3372
3.7607 2.0390 7000 3.8170 0.3415
3.7538 2.3303 8000 3.7856 0.3444
3.7389 2.6216 9000 3.7552 0.3469
3.7227 2.9130 10000 3.7284 0.3496
3.6345 3.2042 11000 3.7158 0.3513
3.651 3.4955 12000 3.6981 0.3531
3.6375 3.7868 13000 3.6834 0.3543
3.5478 4.0781 14000 3.6710 0.3561
3.5756 4.3694 15000 3.6604 0.3571
3.5855 4.6607 16000 3.6468 0.3586
3.5811 4.9520 17000 3.6346 0.3596
3.5085 5.2432 18000 3.6379 0.3602
3.5343 5.5345 19000 3.6267 0.3608
3.5223 5.8259 20000 3.6158 0.3618
3.4387 6.1171 21000 3.6203 0.3623
3.4844 6.4084 22000 3.6126 0.3628
3.4801 6.6997 23000 3.6029 0.3639
3.4928 6.9910 24000 3.5933 0.3648
3.434 7.2823 25000 3.6025 0.3643
3.4633 7.5736 26000 3.5915 0.3652
3.4652 7.8649 27000 3.5824 0.3659
3.3938 8.1561 28000 3.5901 0.3657
3.4171 8.4474 29000 3.5859 0.3659
3.4328 8.7388 30000 3.5773 0.3669
3.3374 9.0300 31000 3.5828 0.3671
3.3887 9.3213 32000 3.5805 0.3673
3.4039 9.6126 33000 3.5734 0.3677
3.4186 9.9039 34000 3.5653 0.3685
3.3404 10.1952 35000 3.5763 0.3682
3.3729 10.4865 36000 3.5691 0.3682
3.3936 10.7778 37000 3.5624 0.3686
3.2871 11.0690 38000 3.5698 0.3692
3.3399 11.3603 39000 3.5649 0.3693
3.3684 11.6517 40000 3.5595 0.3699
3.3729 11.9430 41000 3.5530 0.3702
3.3031 12.2342 42000 3.5656 0.3692
3.3536 12.5255 43000 3.5577 0.3701
3.3609 12.8168 44000 3.5498 0.3705
3.2677 13.1081 45000 3.5631 0.3702
3.3209 13.3994 46000 3.5595 0.3704
3.3413 13.6907 47000 3.5529 0.3709
3.348 13.9820 48000 3.5454 0.3715
3.2852 14.2732 49000 3.5590 0.3707
3.3133 14.5646 50000 3.5517 0.3713
3.3249 14.8559 51000 3.5443 0.3717
3.2419 15.1471 52000 3.5618 0.3710
3.2753 15.4384 53000 3.5520 0.3713
3.3033 15.7297 54000 3.5458 0.3720
3.2057 16.0210 55000 3.5570 0.3718
3.2579 16.3123 56000 3.5514 0.3718
3.282 16.6036 57000 3.5490 0.3721
3.3136 16.8949 58000 3.5383 0.3727
3.2328 17.1861 59000 3.5542 0.3720
3.2686 17.4775 60000 3.5456 0.3724
3.2832 17.7688 61000 3.5407 0.3726
3.1855 18.0600 62000 3.5547 0.3723
3.2353 18.3513 63000 3.5536 0.3721
3.265 18.6426 64000 3.5431 0.3731
3.2902 18.9339 65000 3.5362 0.3731
3.2075 19.2252 66000 3.5558 0.3725
3.2463 19.5165 67000 3.5461 0.3726
3.2592 19.8078 68000 3.5384 0.3731
3.1898 20.0990 69000 3.5559 0.3727
3.2148 20.3904 70000 3.5483 0.3730
3.2513 20.6817 71000 3.5408 0.3737
3.2646 20.9730 72000 3.5328 0.3740
3.1964 21.2642 73000 3.5523 0.3730
3.2209 21.5555 74000 3.5462 0.3733
3.2431 21.8468 75000 3.5349 0.3740
3.1708 22.1381 76000 3.5538 0.3733
3.2133 22.4294 77000 3.5428 0.3736
3.2188 22.7207 78000 3.5397 0.3740
3.1532 23.0119 79000 3.5490 0.3735
3.1781 23.3033 80000 3.5497 0.3736
3.2226 23.5946 81000 3.5435 0.3740
3.2186 23.8859 82000 3.5341 0.3744
3.15 24.1771 83000 3.5553 0.3732
3.1901 24.4684 84000 3.5481 0.3741
3.2069 24.7597 85000 3.5377 0.3743
3.125 25.0510 86000 3.5503 0.3738
3.1808 25.3423 87000 3.5512 0.3739
3.1951 25.6336 88000 3.5415 0.3742
3.2001 25.9249 89000 3.5327 0.3747
3.14 26.2162 90000 3.5526 0.3738
3.1702 26.5075 91000 3.5467 0.3742
3.1881 26.7988 92000 3.5360 0.3748
3.118 27.0900 93000 3.5548 0.3738
3.1445 27.3813 94000 3.5487 0.3741
3.1757 27.6726 95000 3.5402 0.3746
3.1963 27.9639 96000 3.5341 0.3751
3.1336 28.2552 97000 3.5514 0.3741
3.1754 28.5465 98000 3.5453 0.3746
3.1774 28.8378 99000 3.5393 0.3749
3.1134 29.1290 100000 3.5526 0.3743
3.1463 29.4204 101000 3.5483 0.3746
3.1666 29.7117 102000 3.5405 0.3749
3.1574 30.0029 103000 3.5517 0.3743
3.1275 30.2942 104000 3.5561 0.3742
3.1505 30.5855 105000 3.5447 0.3749
3.1663 30.8768 106000 3.5387 0.3750
3.0979 31.1681 107000 3.5543 0.3745
3.1313 31.4594 108000 3.5532 0.3745
3.142 31.7507 109000 3.5411 0.3752

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support