balanced_small_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.1817
- Accuracy: 0.4012
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 6.0139 | 0.9997 | 1503 | 4.4236 | 0.2912 |
| 3.941 | 1.9994 | 3006 | 3.8961 | 0.3335 |
| 3.6814 | 2.9998 | 4510 | 3.6198 | 0.3578 |
| 3.3678 | 3.9996 | 6013 | 3.4583 | 0.3724 |
| 3.2601 | 4.9993 | 7516 | 3.3653 | 0.3812 |
| 3.1439 | 5.9998 | 9020 | 3.3011 | 0.3870 |
| 3.0803 | 6.9995 | 10523 | 3.2671 | 0.3907 |
| 3.0295 | 7.9999 | 12027 | 3.2407 | 0.3935 |
| 2.9783 | 8.9997 | 13530 | 3.2248 | 0.3950 |
| 2.9602 | 9.9994 | 15033 | 3.2122 | 0.3964 |
| 2.913 | 10.9998 | 16537 | 3.2051 | 0.3972 |
| 2.9128 | 11.9996 | 18040 | 3.2023 | 0.3982 |
| 2.8701 | 12.9993 | 19543 | 3.1954 | 0.3986 |
| 2.8805 | 13.9998 | 21047 | 3.1940 | 0.3996 |
| 2.8402 | 14.9995 | 22550 | 3.1920 | 0.3997 |
| 2.8567 | 15.9999 | 24054 | 3.1913 | 0.4001 |
| 2.8191 | 16.9997 | 25557 | 3.1891 | 0.4006 |
| 2.8421 | 17.9994 | 27060 | 3.1865 | 0.4004 |
| 2.8056 | 18.9998 | 28564 | 3.1834 | 0.4006 |
| 2.8322 | 19.9949 | 30060 | 3.1817 | 0.4012 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.20.0
- Downloads last month
- 1