gemma2-mentalchat16k

This model is a fine-tuned version of google/gemma-2b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7946

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 6
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.0076 0.1122 100 0.9827
0.9399 0.2243 200 0.9345
0.9054 0.3365 300 0.9031
0.8561 0.4487 400 0.8859
0.8794 0.5609 500 0.8711
0.844 0.6730 600 0.8557
0.8305 0.7852 700 0.8461
0.8207 0.8974 800 0.8400
0.8117 1.0090 900 0.8529
0.7338 1.1211 1000 0.8448
0.7422 1.2333 1100 0.8332
0.6964 1.3455 1200 0.8273
0.7064 1.4577 1300 0.8252
0.7201 1.5698 1400 0.8170
0.7162 1.6820 1500 0.8121
0.688 1.7942 1600 0.8088
0.7166 1.9063 1700 0.7998
0.636 2.0179 1800 0.8447
0.5388 2.1301 1900 0.8485
0.5319 2.2423 2000 0.8444
0.5396 2.3545 2100 0.8498
0.5523 2.4666 2200 0.8446

Framework versions

  • PEFT 0.15.2
  • Transformers 4.54.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for advy/gemma2-mentalchat16k

Base model

google/gemma-2b
Adapter
(23692)
this model