gpt2_tiny_baby_30M_32768_42
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.7889
- Accuracy: 0.3364
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 40000
- training_steps: 100000
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 8.2793 | 0.74 | 2000 | 8.1124 | 0.1381 |
| 5.66 | 1.48 | 4000 | 6.1510 | 0.1897 |
| 4.6738 | 2.22 | 6000 | 5.4672 | 0.2165 |
| 4.4223 | 2.96 | 8000 | 5.2036 | 0.2257 |
| 4.2575 | 3.7 | 10000 | 5.0124 | 0.2348 |
| 4.1117 | 4.44 | 12000 | 4.8397 | 0.2489 |
| 4.0061 | 5.18 | 14000 | 4.6988 | 0.2601 |
| 3.891 | 5.92 | 16000 | 4.5760 | 0.2685 |
| 3.8028 | 6.66 | 18000 | 4.4775 | 0.2744 |
| 3.7241 | 7.4 | 20000 | 4.3932 | 0.2806 |
| 3.6535 | 8.14 | 22000 | 4.3117 | 0.2869 |
| 3.5996 | 8.88 | 24000 | 4.2564 | 0.2908 |
| 3.528 | 9.62 | 26000 | 4.2035 | 0.2957 |
| 3.4719 | 10.36 | 28000 | 4.1553 | 0.2992 |
| 3.433 | 11.1 | 30000 | 4.1164 | 0.3018 |
| 3.3897 | 11.84 | 32000 | 4.0715 | 0.3069 |
| 3.351 | 12.58 | 34000 | 4.0411 | 0.3101 |
| 3.2992 | 13.32 | 36000 | 4.0163 | 0.3128 |
| 3.2833 | 14.06 | 38000 | 3.9909 | 0.3149 |
| 3.2566 | 14.8 | 40000 | 3.9706 | 0.3168 |
| 3.2263 | 15.54 | 42000 | 3.9436 | 0.3194 |
| 3.198 | 16.28 | 44000 | 3.9330 | 0.3208 |
| 3.1846 | 17.02 | 46000 | 3.9197 | 0.3216 |
| 3.1692 | 17.76 | 48000 | 3.8990 | 0.3245 |
| 3.142 | 18.5 | 50000 | 3.8920 | 0.3249 |
| 3.1276 | 19.24 | 52000 | 3.8815 | 0.3258 |
| 3.1248 | 19.98 | 54000 | 3.8705 | 0.3267 |
| 3.1057 | 20.72 | 56000 | 3.8623 | 0.3281 |
| 3.097 | 21.46 | 58000 | 3.8542 | 0.3282 |
| 3.0836 | 22.2 | 60000 | 3.8472 | 0.3294 |
| 3.0856 | 22.94 | 62000 | 3.8448 | 0.3293 |
| 3.0697 | 23.68 | 64000 | 3.8378 | 0.3311 |
| 3.0515 | 24.42 | 66000 | 3.8402 | 0.3304 |
| 3.0551 | 25.16 | 68000 | 3.8369 | 0.3311 |
| 3.0508 | 25.9 | 70000 | 3.8256 | 0.3316 |
| 3.0337 | 26.64 | 72000 | 3.8208 | 0.3323 |
| 3.0314 | 27.38 | 74000 | 3.8219 | 0.3324 |
| 3.034 | 28.12 | 76000 | 3.8145 | 0.3334 |
| 3.0297 | 28.86 | 78000 | 3.8122 | 0.3335 |
| 3.0164 | 29.6 | 80000 | 3.8123 | 0.3338 |
| 3.0121 | 30.34 | 82000 | 3.8085 | 0.3342 |
| 3.0106 | 31.08 | 84000 | 3.8049 | 0.3339 |
| 3.0019 | 31.82 | 86000 | 3.7997 | 0.3348 |
| 3.0007 | 32.56 | 88000 | 3.8010 | 0.3350 |
| 2.9967 | 33.3 | 90000 | 3.8010 | 0.3349 |
| 2.9911 | 34.04 | 92000 | 3.7952 | 0.3358 |
| 2.987 | 34.78 | 94000 | 3.7940 | 0.3357 |
| 2.9867 | 35.52 | 96000 | 3.7889 | 0.3364 |
| 2.9788 | 36.26 | 98000 | 3.7899 | 0.3364 |
| 2.9818 | 37.0 | 100000 | 3.7893 | 0.3363 |
Framework versions
- Transformers 4.30.2
- Pytorch 2.0.0+cu117
- Datasets 4.1.1
- Tokenizers 0.13.3
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support