results

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
3.487	0.1216	500	2.7628
2.4975	0.2433	1000	2.2091
2.2501	0.3649	1500	1.8555
2.0317	0.4866	2000	1.6036
1.951	0.6082	2500	1.4196
1.8645	0.7298	3000	1.2600
1.7716	0.8515	3500	1.1290
1.7462	0.9731	4000	1.0334
1.6157	1.0946	4500	0.9300
1.5509	1.2163	5000	0.8553
1.5186	1.3379	5500	0.7855
1.4767	1.4596	6000	0.7299
1.4667	1.5812	6500	0.6972
1.481	1.7028	7000	0.6611
1.4245	1.8245	7500	0.6109
1.4017	1.9461	8000	0.5911
1.3376	2.0676	8500	0.5671
1.3276	2.1893	9000	0.5600
1.3228	2.3109	9500	0.5398
1.3184	2.4326	10000	0.5246
1.2939	2.5542	10500	0.5100
1.3121	2.6758	11000	0.5025
1.2904	2.7975	11500	0.4938
1.2743	2.9191	12000	0.4903

Safetensors

Model size

81.9M params

Tensor type

F32

Base model

Finetuned

this model

Finetunes