twi-gpt2-chatbot

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.2445

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
6.4994	0.0849	100	6.1689
4.5864	0.1698	200	4.7851
4.3277	0.2547	300	4.4080
4.2936	0.3396	400	4.1495
4.1983	0.4245	500	3.9803
3.9986	0.5094	600	3.8268
3.7634	0.5943	700	3.7163
3.6461	0.6792	800	3.6104
3.4887	0.7641	900	3.5349
3.5108	0.8490	1000	3.4364
3.2161	0.9339	1100	3.3669
3.546	1.0187	1200	3.2748
3.4026	1.1036	1300	3.2208
3.44	1.1885	1400	3.1730
3.283	1.2734	1500	3.1227
3.3221	1.3583	1600	3.1067
3.4095	1.4432	1700	3.0562
3.3481	1.5281	1800	3.0305
3.1545	1.6130	1900	3.0170
3.1984	1.6979	2000	2.9827
3.0847	1.7828	2100	2.9422
3.3866	1.8677	2200	2.9272
3.0257	1.9526	2300	2.9224
3.1062	2.0374	2400	2.8985
2.7489	2.1223	2500	2.8669
3.075	2.2072	2600	2.8483
2.954	2.2921	2700	2.8486
2.3789	2.3770	2800	2.8247
2.7708	2.4618	2900	2.8058
2.5208	2.5467	3000	2.7966
2.8372	2.6316	3100	2.7822
2.9272	2.7165	3200	2.7606
2.8024	2.8014	3300	2.7540
2.7681	2.8863	3400	2.7258
2.8271	2.9712	3500	2.7280
3.041	3.0560	3600	2.7147
2.4446	3.1409	3700	2.7112
2.6514	3.2258	3800	2.6837
2.7023	3.3107	3900	2.6637
2.8137	3.3956	4000	2.6606
2.5274	3.4805	4100	2.6549
2.9085	3.5654	4200	2.6278
2.295	3.6503	4300	2.6248
2.5	3.7352	4400	2.6132
2.7344	3.8201	4500	2.6046
2.6444	3.9050	4600	2.5925
2.6606	3.9899	4700	2.5834
2.4187	4.0747	4800	2.5820
2.6922	4.1596	4900	2.5824
2.6234	4.2445	5000	2.5704
2.683	4.3294	5100	2.5535
2.708	4.4143	5200	2.5357
2.3802	4.4992	5300	2.5413
2.7813	4.5841	5400	2.5268
2.3089	4.6690	5500	2.5105
2.5862	4.7539	5600	2.5050
2.6705	4.8388	5700	2.4978
2.2654	4.9237	5800	2.4747
2.4846	5.0085	5900	2.4753
2.4755	5.0934	6000	2.4659
2.564	5.1783	6100	2.4705
2.4302	5.2632	6200	2.4548
2.5107	5.3481	6300	2.4528
2.2664	5.4330	6400	2.4545
2.2634	5.5179	6500	2.4382
2.5023	5.6028	6600	2.4251
2.5727	5.6877	6700	2.4152
2.4165	5.7726	6800	2.4144
2.4466	5.8575	6900	2.4052
2.3955	5.9424	7000	2.3979
2.4405	6.0272	7100	2.3935
2.4716	6.1121	7200	2.3919
2.3264	6.1970	7300	2.3763
2.6313	6.2819	7400	2.3721
2.725	6.3668	7500	2.3606
2.4446	6.4517	7600	2.3619
2.4713	6.5366	7700	2.3587
2.5411	6.6215	7800	2.3605
2.5612	6.7064	7900	2.3451
2.3908	6.7913	8000	2.3425
2.2039	6.8762	8100	2.3377
2.5673	6.9611	8200	2.3371
2.5507	7.0458	8300	2.3305
2.5523	7.1307	8400	2.3217
1.8872	7.2156	8500	2.3309
2.1361	7.3005	8600	2.3185
2.355	7.3854	8700	2.3111
2.4069	7.4703	8800	2.3132
2.1578	7.5552	8900	2.3070
2.4514	7.6401	9000	2.3018
2.5844	7.7250	9100	2.2927
2.3247	7.8099	9200	2.2954
2.2271	7.8948	9300	2.2925
2.0324	7.9797	9400	2.2877
2.388	8.0645	9500	2.2867
2.63	8.1494	9600	2.2787
2.3989	8.2343	9700	2.2783
2.4267	8.3192	9800	2.2749
2.0576	8.4041	9900	2.2767
2.2635	8.4890	10000	2.2729
2.2062	8.5739	10100	2.2654
2.3503	8.6588	10200	2.2667
2.5318	8.7437	10300	2.2618
2.5574	8.8286	10400	2.2591
2.2985	8.9135	10500	2.2568
2.0863	8.9984	10600	2.2556
2.2481	9.0832	10700	2.2574
2.2429	9.1681	10800	2.2547
2.5296	9.2530	10900	2.2552
2.3072	9.3379	11000	2.2519
2.3443	9.4228	11100	2.2482
2.0659	9.5077	11200	2.2502
2.6412	9.5926	11300	2.2488
2.4199	9.6775	11400	2.2477
2.3524	9.7624	11500	2.2459
2.3202	9.8473	11600	2.2450
2.4945	9.9322	11700	2.2445

Framework versions

PEFT 0.15.2
Transformers 4.52.4
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.2

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FelixYaw/twi-gpt2-chatbot

Base model

distilbert/distilgpt2

Adapter

(79)

this model