advy's picture
Finetune on MentalChat16K - eval_loss: 0.8088
f98d555 verified
metadata
library_name: peft
license: gemma
base_model: google/gemma-2b
tags:
  - generated_from_trainer
model-index:
  - name: gemma2-mentalchat16k
    results: []

gemma2-mentalchat16k

This model is a fine-tuned version of google/gemma-2b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7946

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 6
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.0076 0.1122 100 0.9827
0.9399 0.2243 200 0.9345
0.9054 0.3365 300 0.9031
0.8561 0.4487 400 0.8859
0.8794 0.5609 500 0.8711
0.844 0.6730 600 0.8557
0.8305 0.7852 700 0.8461
0.8207 0.8974 800 0.8400
0.8117 1.0090 900 0.8529
0.7338 1.1211 1000 0.8448
0.7422 1.2333 1100 0.8332
0.6964 1.3455 1200 0.8273
0.7064 1.4577 1300 0.8252
0.7201 1.5698 1400 0.8170
0.7162 1.6820 1500 0.8121
0.688 1.7942 1600 0.8088
0.7166 1.9063 1700 0.7998
0.636 2.0179 1800 0.8447
0.5388 2.1301 1900 0.8485
0.5319 2.2423 2000 0.8444
0.5396 2.3545 2100 0.8498
0.5523 2.4666 2200 0.8446

Framework versions

  • PEFT 0.15.2
  • Transformers 4.54.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.21.1