de_wiki_mlm_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0055

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.0796 2000 8.1295
8.165 2.1592 4000 7.4511
8.165 3.2389 6000 7.3578
7.369 4.3185 8000 7.2654
7.369 5.3981 10000 7.1874
7.2089 6.4777 12000 7.1034
7.2089 7.5574 14000 7.0611
7.0701 8.6370 16000 6.9701
7.0701 9.7166 18000 6.9493
6.9567 10.7962 20000 6.8926
6.9567 11.8758 22000 6.8479
6.863 12.9555 24000 6.7844
6.863 14.0351 26000 6.7093
6.7236 15.1147 28000 6.5634
6.7236 16.1943 30000 6.4365
6.4847 17.2740 32000 6.1951
6.4847 18.3536 34000 5.8373
5.9256 19.4332 36000 5.2895
5.9256 20.5128 38000 4.9567
5.0992 21.5924 40000 4.6883
5.0992 22.6721 42000 4.4813
4.6231 23.7517 44000 4.3045
4.6231 24.8313 46000 4.1386
4.2683 25.9109 48000 4.0062
4.2683 26.9906 50000 3.8906
4.0096 28.0702 52000 3.7984
4.0096 29.1498 54000 3.7150
3.8199 30.2294 56000 3.6128
3.8199 31.3090 58000 3.5471
3.6694 32.3887 60000 3.4932
3.6694 33.4683 62000 3.4384
3.5482 34.5479 64000 3.4021
3.5482 35.6275 66000 3.3532
3.4458 36.7072 68000 3.3054
3.4458 37.7868 70000 3.2852
3.3631 38.8664 72000 3.2286
3.3631 39.9460 74000 3.1990
3.3013 41.0256 76000 3.1797
3.3013 42.1053 78000 3.1476
3.2413 43.1849 80000 3.1266
3.2413 44.2645 82000 3.1271
3.1916 45.3441 84000 3.0851
3.1916 46.4238 86000 3.0758
3.1512 47.5034 88000 3.0595
3.1512 48.5830 90000 3.0410
3.1197 49.6626 92000 3.0217
3.1197 50.7422 94000 3.0267
3.0925 51.8219 96000 3.0220
3.0925 52.9015 98000 3.0122
3.0794 53.9811 100000 3.0055

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support