default_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0138
  • Accuracy: 0.4207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.1321 0.9998 1777 4.2568 0.3056
4.0475 1.9996 3554 3.7355 0.3470
3.6065 2.9999 5332 3.4711 0.3706
3.3801 3.9997 7109 3.3293 0.3842
3.2903 4.9995 8886 3.2514 0.3915
3.1851 5.9999 10664 3.2029 0.3965
3.1235 6.9996 12441 3.1710 0.3997
3.0808 8.0 14219 3.1503 0.4018
3.0393 8.9998 15996 3.1347 0.4036
3.0 9.9996 17773 3.1246 0.4050
2.9905 10.9999 19551 3.1159 0.4061
2.9815 11.9997 21328 3.1097 0.4066
2.9737 12.9995 23105 3.1053 0.4074
2.9326 13.9999 24883 3.0968 0.4081
2.9288 14.9996 26660 3.0987 0.4079
2.9358 16.0 28438 3.0968 0.4083
2.9394 16.9998 30215 3.0941 0.4088
2.9087 17.9996 31992 3.0921 0.4084
2.8774 18.9999 33770 3.0431 0.4155
2.7328 19.9958 35540 3.0138 0.4207

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.0
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support