de_wiki_clm_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0375

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0796 2000 7.8013
7.9145 2.1592 4000 7.0872
7.9145 3.2389 6000 6.6353
6.6919 4.3185 8000 6.2753
6.6919 5.3981 10000 5.9640
6.0331 6.4777 12000 5.6939
6.0331 7.5574 14000 5.4531
5.5261 8.6370 16000 5.2534
5.5261 9.7166 18000 5.0814
5.1455 10.7962 20000 4.9420
5.1455 11.8758 22000 4.8196
4.8678 12.9555 24000 4.7144
4.8678 14.0351 26000 4.6195
4.6498 15.1147 28000 4.5415
4.6498 16.1943 30000 4.4792
4.4749 17.2740 32000 4.4163
4.4749 18.3536 34000 4.3686
4.3329 19.4332 36000 4.3249
4.3329 20.5128 38000 4.2838
4.2153 21.5924 40000 4.2503
4.2153 22.6721 42000 4.2175
4.105 23.7517 44000 4.1878
4.105 24.8313 46000 4.1638
4.0056 25.9109 48000 4.1418
4.0056 26.9906 50000 4.1273
3.9206 28.0702 52000 4.1156
3.9206 29.1498 54000 4.1051
3.8488 30.2294 56000 4.0962
3.8488 31.3090 58000 4.0877
3.7907 32.3887 60000 4.0822
3.7907 33.4683 62000 4.0746
3.7405 34.5479 64000 4.0698
3.7405 35.6275 66000 4.0621
3.6943 36.7072 68000 4.0587
3.6943 37.7868 70000 4.0554
3.6534 38.8664 72000 4.0520
3.6534 39.9460 74000 4.0502
3.6158 41.0256 76000 4.0489
3.6158 42.1053 78000 4.0538
3.5796 43.1849 80000 4.0492
3.5796 44.2645 82000 4.0464
3.5501 45.3441 84000 4.0478
3.5501 46.4238 86000 4.0443
3.5235 47.5034 88000 4.0431
3.5235 48.5830 90000 4.0419
3.4985 49.6626 92000 4.0406
3.4985 50.7422 94000 4.0393
3.4768 51.8219 96000 4.0391
3.4768 52.9015 98000 4.0383
3.458 53.9811 100000 4.0375

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support