results

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4903

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
3.487 0.1216 500 2.7628
2.4975 0.2433 1000 2.2091
2.2501 0.3649 1500 1.8555
2.0317 0.4866 2000 1.6036
1.951 0.6082 2500 1.4196
1.8645 0.7298 3000 1.2600
1.7716 0.8515 3500 1.1290
1.7462 0.9731 4000 1.0334
1.6157 1.0946 4500 0.9300
1.5509 1.2163 5000 0.8553
1.5186 1.3379 5500 0.7855
1.4767 1.4596 6000 0.7299
1.4667 1.5812 6500 0.6972
1.481 1.7028 7000 0.6611
1.4245 1.8245 7500 0.6109
1.4017 1.9461 8000 0.5911
1.3376 2.0676 8500 0.5671
1.3276 2.1893 9000 0.5600
1.3228 2.3109 9500 0.5398
1.3184 2.4326 10000 0.5246
1.2939 2.5542 10500 0.5100
1.3121 2.6758 11000 0.5025
1.2904 2.7975 11500 0.4938
1.2743 2.9191 12000 0.4903

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
1
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FelixYaw/results

Finetuned
(1365)
this model
Finetunes
3 models