full_nepali-captioning

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1313

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.9763 500 3.1892
No log 3.9526 1000 2.6822
No log 5.9289 1500 2.4790
3.2335 7.9051 2000 2.3633
3.2335 9.8814 2500 2.2929
3.2335 11.8577 3000 2.2464
3.2335 13.8340 3500 2.2120
2.1358 15.8103 4000 2.1875
2.1358 17.7866 4500 2.1714
2.1358 19.7628 5000 2.1574
2.1358 21.7391 5500 2.1491
1.94 23.7154 6000 2.1411
1.94 25.6917 6500 2.1356
1.94 27.6680 7000 2.1331
1.94 29.6443 7500 2.1313

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support