augustocsc's picture
Model save
6c1380b verified
metadata
library_name: peft
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: Se124M100KInfPrompt_endtoken
    results: []

Se124M100KInfPrompt_endtoken

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.7209 1.0 5717 0.7060
0.7027 2.0 11434 0.6916
0.7005 3.0 17151 0.6865
0.7009 4.0 22868 0.6858
0.6933 5.0 28585 0.6854
0.6922 6.0 34302 0.6825
0.6859 7.0 40019 0.6810
0.6923 8.0 45736 0.6812
0.6919 9.0 51453 0.6809
0.6871 10.0 57170 0.6795
0.6844 11.0 62887 0.6776
0.6923 12.0 68604 0.6780
0.6878 13.0 74321 0.6785
0.6765 14.0 80038 0.6775
0.6864 15.0 85755 0.6769
0.6776 16.0 91472 0.6761
0.6823 17.0 97189 0.6768
0.6743 18.0 102906 0.6751
0.682 19.0 108623 0.6776
0.6902 20.0 114340 0.6762
0.6774 21.0 120057 0.6751
0.6748 22.0 125774 0.6747
0.6864 23.0 131491 0.6745
0.6819 24.0 137208 0.6756
0.6818 25.0 142925 0.6745
0.6757 26.0 148642 0.6737
0.6801 27.0 154359 0.6734
0.6717 28.0 160076 0.6724
0.6717 29.0 165793 0.6722
0.6802 30.0 171510 0.6723
0.677 31.0 177227 0.6725
0.6764 32.0 182944 0.6712
0.6767 33.0 188661 0.6712
0.6758 34.0 194378 0.6716
0.6772 35.0 200095 0.6715
0.679 36.0 205812 0.6717
0.6744 37.0 211529 0.6702
0.6654 38.0 217246 0.6707
0.6723 39.0 222963 0.6704
0.6758 40.0 228680 0.6701
0.6795 41.0 234397 0.6701
0.6681 42.0 240114 0.6698
0.6761 43.0 245831 0.6700
0.673 44.0 251548 0.6697
0.6736 45.0 257265 0.6698
0.673 46.0 262982 0.6695
0.6686 47.0 268699 0.6695
0.666 48.0 274416 0.6696
0.663 49.0 280133 0.6695
0.6667 50.0 285850 0.6695

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1