de_wiki_clm_30
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.0348
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 30
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 40000
- training_steps: 100000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 1.0796 | 2000 | 7.8191 |
| 7.928 | 2.1592 | 4000 | 7.0870 |
| 7.928 | 3.2389 | 6000 | 6.6422 |
| 6.6946 | 4.3185 | 8000 | 6.2840 |
| 6.6946 | 5.3981 | 10000 | 5.9706 |
| 6.037 | 6.4777 | 12000 | 5.6935 |
| 6.037 | 7.5574 | 14000 | 5.4614 |
| 5.5288 | 8.6370 | 16000 | 5.2527 |
| 5.5288 | 9.7166 | 18000 | 5.0790 |
| 5.1465 | 10.7962 | 20000 | 4.9348 |
| 5.1465 | 11.8758 | 22000 | 4.8114 |
| 4.8667 | 12.9555 | 24000 | 4.7085 |
| 4.8667 | 14.0351 | 26000 | 4.6242 |
| 4.6478 | 15.1147 | 28000 | 4.5389 |
| 4.6478 | 16.1943 | 30000 | 4.4701 |
| 4.4727 | 17.2740 | 32000 | 4.4099 |
| 4.4727 | 18.3536 | 34000 | 4.3633 |
| 4.3307 | 19.4332 | 36000 | 4.3184 |
| 4.3307 | 20.5128 | 38000 | 4.2779 |
| 4.2116 | 21.5924 | 40000 | 4.2453 |
| 4.2116 | 22.6721 | 42000 | 4.2135 |
| 4.1017 | 23.7517 | 44000 | 4.1839 |
| 4.1017 | 24.8313 | 46000 | 4.1570 |
| 4.0019 | 25.9109 | 48000 | 4.1387 |
| 4.0019 | 26.9906 | 50000 | 4.1239 |
| 3.9164 | 28.0702 | 52000 | 4.1119 |
| 3.9164 | 29.1498 | 54000 | 4.1000 |
| 3.8451 | 30.2294 | 56000 | 4.0912 |
| 3.8451 | 31.3090 | 58000 | 4.0843 |
| 3.7863 | 32.3887 | 60000 | 4.0820 |
| 3.7863 | 33.4683 | 62000 | 4.0735 |
| 3.7356 | 34.5479 | 64000 | 4.0649 |
| 3.7356 | 35.6275 | 66000 | 4.0574 |
| 3.6893 | 36.7072 | 68000 | 4.0564 |
| 3.6893 | 37.7868 | 70000 | 4.0526 |
| 3.6492 | 38.8664 | 72000 | 4.0485 |
| 3.6492 | 39.9460 | 74000 | 4.0457 |
| 3.6111 | 41.0256 | 76000 | 4.0483 |
| 3.6111 | 42.1053 | 78000 | 4.0443 |
| 3.5749 | 43.1849 | 80000 | 4.0452 |
| 3.5749 | 44.2645 | 82000 | 4.0453 |
| 3.5442 | 45.3441 | 84000 | 4.0435 |
| 3.5442 | 46.4238 | 86000 | 4.0421 |
| 3.5184 | 47.5034 | 88000 | 4.0403 |
| 3.5184 | 48.5830 | 90000 | 4.0411 |
| 3.4926 | 49.6626 | 92000 | 4.0383 |
| 3.4926 | 50.7422 | 94000 | 4.0385 |
| 3.4715 | 51.8219 | 96000 | 4.0355 |
| 3.4715 | 52.9015 | 98000 | 4.0359 |
| 3.4519 | 53.9811 | 100000 | 4.0348 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 2