collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6314
  • Num Input Tokens Seen: 4852992

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3911 0
1.5169 0.0579 5 1.2663 285536
1.3736 0.1158 10 1.1918 567456
1.1135 0.1737 15 1.2205 844248
0.8215 0.2315 20 1.2889 1125792
0.7 0.2894 25 1.3912 1408488
0.4534 0.3473 30 1.4630 1691120
0.4168 0.4052 35 1.5283 1974504
0.3295 0.4631 40 1.5404 2257232
0.1989 0.5210 45 1.4892 2538768
0.2135 0.5789 50 1.5781 2821752
0.1784 0.6368 55 1.4920 3105616
0.1314 0.6946 60 1.5251 3381944
0.1645 0.7525 65 1.5864 3666096
0.0937 0.8104 70 1.4925 3946448
0.1406 0.8683 75 1.5310 4234720
0.1477 0.9262 80 1.5799 4512080
0.0654 0.9841 85 1.6488 4796248

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
43
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KrisMinchev/collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

Base model

google/gemma-2-2b
Finetuned
(389)
this model