collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1938
  • Num Input Tokens Seen: 5128680

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3911 0
1.4866 0.0522 5 1.2772 268824
1.0724 0.1044 10 1.2542 534336
0.7505 0.1567 15 1.3935 806512
0.4711 0.2089 20 1.5034 1072320
0.3516 0.2611 25 1.6782 1347248
0.2276 0.3133 30 1.7960 1629728
0.0914 0.3655 35 1.9368 1898144
0.0729 0.4178 40 1.9806 2160104
0.0788 0.4700 45 2.0355 2431552
0.047 0.5222 50 2.0447 2702376
0.0378 0.5744 55 2.0477 2977136
0.0651 0.6266 60 2.0250 3244952
0.0406 0.6789 65 2.0630 3517952
0.0353 0.7311 70 2.0337 3785248
0.0367 0.7833 75 2.1002 4058328
0.0314 0.8355 80 2.1372 4327344
0.0297 0.8877 85 2.1233 4590592
0.0375 0.9399 90 2.1327 4856992
0.0392 0.9922 95 2.1938 5128680

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
17
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KrisMinchev/collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

Base model

google/gemma-2-2b
Finetuned
(387)
this model