tinyshakespeare-13m / README.md
jadicorn's picture
End of training
7d316e4 verified
metadata
library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - tiny_shakespeare
model-index:
  - name: tinyshakespeare-13m
    results: []

tinyshakespeare-13m

This model is a fine-tuned version of on the tiny_shakespeare dataset. It achieves the following results on the evaluation set:

  • Loss: 4.9693

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 32 7.3043
7.7103 2.0 64 5.9158
7.7103 3.0 96 5.6154
5.8675 4.0 128 5.3680
5.4479 5.0 160 5.2088
5.4479 6.0 192 5.1126
5.1825 7.0 224 5.0313
4.9945 8.0 256 4.9771
4.9945 9.0 288 4.9379
4.8838 10.0 320 4.9208
4.7883 11.0 352 4.8985
4.7883 12.0 384 4.8766
4.7261 13.0 416 4.8631
4.7261 14.0 448 4.8617
4.6621 15.0 480 4.8445
4.5955 16.0 512 4.8370
4.5955 17.0 544 4.8295
4.52 18.0 576 4.8215
4.4819 19.0 608 4.8278
4.4819 20.0 640 4.8169
4.4415 21.0 672 4.8252
4.3929 22.0 704 4.8199
4.3929 23.0 736 4.8243
4.3438 24.0 768 4.8340
4.3117 25.0 800 4.8309
4.3117 26.0 832 4.8410
4.2626 27.0 864 4.8439
4.2626 28.0 896 4.8437
4.2404 29.0 928 4.8404
4.1957 30.0 960 4.8540
4.1957 31.0 992 4.8560
4.1681 32.0 1024 4.8653
4.1441 33.0 1056 4.8725
4.1441 34.0 1088 4.8770
4.1097 35.0 1120 4.8798
4.0823 36.0 1152 4.8884
4.0823 37.0 1184 4.8869
4.0783 38.0 1216 4.8925
4.0783 39.0 1248 4.8948
4.0641 40.0 1280 4.8941

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.22.1