checkpoints

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2659

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 96
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 2000
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
7.4032 0.0845 500 7.3900
6.6368 0.1689 1000 6.6176
6.0293 0.2534 1500 6.0336
5.4871 0.3379 2000 5.4602
5.1774 0.4224 2500 5.1387
4.9533 0.5068 3000 4.9452
4.8279 0.5913 3500 4.8122
4.7441 0.6758 4000 4.7194
4.6783 0.7603 4500 4.6470
4.6144 0.8447 5000 4.5846
4.5477 0.9292 5500 4.5297
4.4920 1.0137 6000 4.4871
4.4523 1.0982 6500 4.4475
4.3954 1.1826 7000 4.4127
4.4032 1.2671 7500 4.3827
4.4052 1.3516 8000 4.3571
4.3566 1.4361 8500 4.3329
4.3505 1.5205 9000 4.3124
4.3208 1.6050 9500 4.2945
4.3149 1.6895 10000 4.2829
4.3015 1.7739 10500 4.2739
4.2932 1.8584 11000 4.2682
4.2789 1.9429 11500 4.2659

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2
Downloads last month
382
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support