de_wiki_clm_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0348

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0796 2000 7.8191
7.928 2.1592 4000 7.0870
7.928 3.2389 6000 6.6422
6.6946 4.3185 8000 6.2840
6.6946 5.3981 10000 5.9706
6.037 6.4777 12000 5.6935
6.037 7.5574 14000 5.4614
5.5288 8.6370 16000 5.2527
5.5288 9.7166 18000 5.0790
5.1465 10.7962 20000 4.9348
5.1465 11.8758 22000 4.8114
4.8667 12.9555 24000 4.7085
4.8667 14.0351 26000 4.6242
4.6478 15.1147 28000 4.5389
4.6478 16.1943 30000 4.4701
4.4727 17.2740 32000 4.4099
4.4727 18.3536 34000 4.3633
4.3307 19.4332 36000 4.3184
4.3307 20.5128 38000 4.2779
4.2116 21.5924 40000 4.2453
4.2116 22.6721 42000 4.2135
4.1017 23.7517 44000 4.1839
4.1017 24.8313 46000 4.1570
4.0019 25.9109 48000 4.1387
4.0019 26.9906 50000 4.1239
3.9164 28.0702 52000 4.1119
3.9164 29.1498 54000 4.1000
3.8451 30.2294 56000 4.0912
3.8451 31.3090 58000 4.0843
3.7863 32.3887 60000 4.0820
3.7863 33.4683 62000 4.0735
3.7356 34.5479 64000 4.0649
3.7356 35.6275 66000 4.0574
3.6893 36.7072 68000 4.0564
3.6893 37.7868 70000 4.0526
3.6492 38.8664 72000 4.0485
3.6492 39.9460 74000 4.0457
3.6111 41.0256 76000 4.0483
3.6111 42.1053 78000 4.0443
3.5749 43.1849 80000 4.0452
3.5749 44.2645 82000 4.0453
3.5442 45.3441 84000 4.0435
3.5442 46.4238 86000 4.0421
3.5184 47.5034 88000 4.0403
3.5184 48.5830 90000 4.0411
3.4926 49.6626 92000 4.0383
3.4926 50.7422 94000 4.0385
3.4715 51.8219 96000 4.0355
3.4715 52.9015 98000 4.0359
3.4519 53.9811 100000 4.0348

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
2
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support