Visualize in Weights & Biases

tinystories_sentences_upsampled_tom

This model is a fine-tuned version of openai-community/gpt2-medium on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.0015 0.0959 400 1.9252
1.8842 0.1918 800 1.8738
1.9057 0.2878 1200 1.8480
1.8935 0.3837 1600 1.8267
1.8203 0.4796 2000 1.8178
1.805 0.5755 2400 1.7996
1.8125 0.6714 2800 1.7878
1.8324 0.7673 3200 1.7848
1.8019 0.8633 3600 1.7702
1.8189 0.9592 4000 1.7650
1.7153 1.0551 4400 1.7618
1.7211 1.1510 4800 1.7552
1.7297 1.2469 5200 1.7499
1.7306 1.3428 5600 1.7446
1.6929 1.4388 6000 1.7406
1.6857 1.5347 6400 1.7373
1.728 1.6306 6800 1.7342
1.7467 1.7265 7200 1.7290
1.6398 1.8224 7600 1.7246
1.7378 1.9184 8000 1.7205
1.6363 2.0143 8400 1.7225
1.7108 2.1102 8800 1.7212
1.6735 2.2061 9200 1.7178
1.6242 2.3020 9600 1.7166
1.6453 2.3979 10000 1.7134
1.646 2.4939 10400 1.7095
1.5925 2.5898 10800 1.7090
1.677 2.6857 11200 1.7069
1.6297 2.7816 11600 1.7053
1.6192 2.8775 12000 1.7039
1.5758 2.9734 12400 1.7036

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.2.2
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ptsv/tinystories_sentences_upsampled_tom

Finetuned
(173)
this model