augustocsc's picture
Model save
d8f4d7f verified
metadata
library_name: peft
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: Se124M100KInfKeyValue
    results: []

Se124M100KInfKeyValue

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.1533 1.0 2090 0.5677
0.141 2.0 4180 0.5401
0.1401 3.0 6270 0.5321
0.1351 4.0 8360 0.5224
0.1361 5.0 10450 0.5209
0.134 6.0 12540 0.5155
0.1315 7.0 14630 0.5118
0.1304 8.0 16720 0.5085
0.1299 9.0 18810 0.5067
0.1294 10.0 20900 0.5066
0.1309 11.0 22990 0.5030
0.129 12.0 25080 0.5004
0.1292 13.0 27170 0.4999
0.1309 14.0 29260 0.4985
0.1312 15.0 31350 0.4974
0.1275 16.0 33440 0.4963
0.1265 17.0 35530 0.4950
0.1276 18.0 37620 0.4936
0.1247 19.0 39710 0.4931
0.1268 20.0 41800 0.4919
0.1273 21.0 43890 0.4914
0.1284 22.0 45980 0.4907
0.1276 23.0 48070 0.4898
0.1284 24.0 50160 0.4894
0.1263 25.0 52250 0.4894
0.1264 26.0 54340 0.4888
0.1257 27.0 56430 0.4881
0.1292 28.0 58520 0.4875
0.1262 29.0 60610 0.4874
0.1272 30.0 62700 0.4867
0.1252 31.0 64790 0.4864
0.1261 32.0 66880 0.4859
0.1262 33.0 68970 0.4858
0.1256 34.0 71060 0.4849
0.1251 35.0 73150 0.4847
0.1263 36.0 75240 0.4843
0.1266 37.0 77330 0.4843
0.125 38.0 79420 0.4842
0.1248 39.0 81510 0.4840
0.126 40.0 83600 0.4837
0.1239 41.0 85690 0.4831
0.1246 42.0 87780 0.4837
0.1241 43.0 89870 0.4830
0.1259 44.0 91960 0.4829
0.1275 45.0 94050 0.4828
0.1258 46.0 96140 0.4826
0.1265 47.0 98230 0.4828
0.1235 48.0 100320 0.4827
0.1255 49.0 102410 0.4827
0.1245 50.0 104500 0.4826

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1