twi-gpt2-chatbot

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2445

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
6.4994 0.0849 100 6.1689
4.5864 0.1698 200 4.7851
4.3277 0.2547 300 4.4080
4.2936 0.3396 400 4.1495
4.1983 0.4245 500 3.9803
3.9986 0.5094 600 3.8268
3.7634 0.5943 700 3.7163
3.6461 0.6792 800 3.6104
3.4887 0.7641 900 3.5349
3.5108 0.8490 1000 3.4364
3.2161 0.9339 1100 3.3669
3.546 1.0187 1200 3.2748
3.4026 1.1036 1300 3.2208
3.44 1.1885 1400 3.1730
3.283 1.2734 1500 3.1227
3.3221 1.3583 1600 3.1067
3.4095 1.4432 1700 3.0562
3.3481 1.5281 1800 3.0305
3.1545 1.6130 1900 3.0170
3.1984 1.6979 2000 2.9827
3.0847 1.7828 2100 2.9422
3.3866 1.8677 2200 2.9272
3.0257 1.9526 2300 2.9224
3.1062 2.0374 2400 2.8985
2.7489 2.1223 2500 2.8669
3.075 2.2072 2600 2.8483
2.954 2.2921 2700 2.8486
2.3789 2.3770 2800 2.8247
2.7708 2.4618 2900 2.8058
2.5208 2.5467 3000 2.7966
2.8372 2.6316 3100 2.7822
2.9272 2.7165 3200 2.7606
2.8024 2.8014 3300 2.7540
2.7681 2.8863 3400 2.7258
2.8271 2.9712 3500 2.7280
3.041 3.0560 3600 2.7147
2.4446 3.1409 3700 2.7112
2.6514 3.2258 3800 2.6837
2.7023 3.3107 3900 2.6637
2.8137 3.3956 4000 2.6606
2.5274 3.4805 4100 2.6549
2.9085 3.5654 4200 2.6278
2.295 3.6503 4300 2.6248
2.5 3.7352 4400 2.6132
2.7344 3.8201 4500 2.6046
2.6444 3.9050 4600 2.5925
2.6606 3.9899 4700 2.5834
2.4187 4.0747 4800 2.5820
2.6922 4.1596 4900 2.5824
2.6234 4.2445 5000 2.5704
2.683 4.3294 5100 2.5535
2.708 4.4143 5200 2.5357
2.3802 4.4992 5300 2.5413
2.7813 4.5841 5400 2.5268
2.3089 4.6690 5500 2.5105
2.5862 4.7539 5600 2.5050
2.6705 4.8388 5700 2.4978
2.2654 4.9237 5800 2.4747
2.4846 5.0085 5900 2.4753
2.4755 5.0934 6000 2.4659
2.564 5.1783 6100 2.4705
2.4302 5.2632 6200 2.4548
2.5107 5.3481 6300 2.4528
2.2664 5.4330 6400 2.4545
2.2634 5.5179 6500 2.4382
2.5023 5.6028 6600 2.4251
2.5727 5.6877 6700 2.4152
2.4165 5.7726 6800 2.4144
2.4466 5.8575 6900 2.4052
2.3955 5.9424 7000 2.3979
2.4405 6.0272 7100 2.3935
2.4716 6.1121 7200 2.3919
2.3264 6.1970 7300 2.3763
2.6313 6.2819 7400 2.3721
2.725 6.3668 7500 2.3606
2.4446 6.4517 7600 2.3619
2.4713 6.5366 7700 2.3587
2.5411 6.6215 7800 2.3605
2.5612 6.7064 7900 2.3451
2.3908 6.7913 8000 2.3425
2.2039 6.8762 8100 2.3377
2.5673 6.9611 8200 2.3371
2.5507 7.0458 8300 2.3305
2.5523 7.1307 8400 2.3217
1.8872 7.2156 8500 2.3309
2.1361 7.3005 8600 2.3185
2.355 7.3854 8700 2.3111
2.4069 7.4703 8800 2.3132
2.1578 7.5552 8900 2.3070
2.4514 7.6401 9000 2.3018
2.5844 7.7250 9100 2.2927
2.3247 7.8099 9200 2.2954
2.2271 7.8948 9300 2.2925
2.0324 7.9797 9400 2.2877
2.388 8.0645 9500 2.2867
2.63 8.1494 9600 2.2787
2.3989 8.2343 9700 2.2783
2.4267 8.3192 9800 2.2749
2.0576 8.4041 9900 2.2767
2.2635 8.4890 10000 2.2729
2.2062 8.5739 10100 2.2654
2.3503 8.6588 10200 2.2667
2.5318 8.7437 10300 2.2618
2.5574 8.8286 10400 2.2591
2.2985 8.9135 10500 2.2568
2.0863 8.9984 10600 2.2556
2.2481 9.0832 10700 2.2574
2.2429 9.1681 10800 2.2547
2.5296 9.2530 10900 2.2552
2.3072 9.3379 11000 2.2519
2.3443 9.4228 11100 2.2482
2.0659 9.5077 11200 2.2502
2.6412 9.5926 11300 2.2488
2.4199 9.6775 11400 2.2477
2.3524 9.7624 11500 2.2459
2.3202 9.8473 11600 2.2450
2.4945 9.9322 11700 2.2445

Framework versions

  • PEFT 0.15.2
  • Transformers 4.52.4
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FelixYaw/twi-gpt2-chatbot

Adapter
(75)
this model