paligemma2_mix_data

This model is a fine-tuned version of google/paligemma2-3b-pt-448 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6083

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
0.9278 0.0372 200 0.9142
0.7324 0.0745 400 0.7881
0.5464 0.1117 600 0.7628
0.6358 0.1489 800 0.7154
0.6393 0.1862 1000 0.6956
0.5533 0.2234 1200 0.6999
0.5798 0.2606 1400 0.6727
0.5291 0.2979 1600 0.6690
0.5927 0.3351 1800 0.6548
0.6512 0.3723 2000 0.6542
0.6157 0.4096 2200 0.6464
0.6177 0.4468 2400 0.6379
0.5941 0.4840 2600 0.6345
0.5513 0.5213 2800 0.6423
0.6359 0.5585 3000 0.6323
0.5513 0.5957 3200 0.6322
0.4695 0.6330 3400 0.6200
0.5851 0.6702 3600 0.6127
0.5475 0.7074 3800 0.6238
0.5264 0.7447 4000 0.6165
0.5325 0.7819 4200 0.6160
0.5497 0.8191 4400 0.6056
0.5338 0.8564 4600 0.6096
0.5604 0.8936 4800 0.6184
0.527 0.9308 5000 0.6015
0.4486 0.9681 5200 0.6078
0.5502 1.0052 5400 0.5992
0.399 1.0424 5600 0.6141
0.4539 1.0797 5800 0.6069
0.4584 1.1169 6000 0.6120
0.4254 1.1541 6200 0.6083

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.3
  • Pytorch 2.9.0+cu126
  • Datasets 4.4.2
  • Tokenizers 0.22.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ebrukilic/paligemma2_mix_data

Adapter
(18)
this model