en_wiki_mlm_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1892

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.1319 2000 7.8860
7.9349 2.2637 4000 7.1166
7.9349 3.3956 6000 7.0106
7.0278 4.5274 8000 6.9329
7.0278 5.6593 10000 6.8746
6.8808 6.7912 12000 6.7941
6.8808 7.9230 14000 6.7404
6.7556 9.0549 16000 6.6972
6.7556 10.1868 18000 6.6456
6.6549 11.3186 20000 6.6062
6.6549 12.4505 22000 6.5677
6.5603 13.5823 24000 6.4841
6.5603 14.7142 26000 6.3446
6.3833 15.8461 28000 6.1835
6.3833 16.9779 30000 5.9980
6.0808 18.1098 32000 5.7140
6.0808 19.2417 34000 5.3122
5.4546 20.3735 36000 4.8991
5.4546 21.5054 38000 4.6956
4.8294 22.6372 40000 4.5697
4.8294 23.7691 42000 4.3905
4.4991 24.9010 44000 4.2690
4.4991 26.0328 46000 4.1406
4.2339 27.1647 48000 4.0296
4.2339 28.2965 50000 3.9278
4.0315 29.4284 52000 3.8567
4.0315 30.5603 54000 3.7756
3.8738 31.6921 56000 3.7191
3.8738 32.8240 58000 3.6721
3.744 33.9559 60000 3.6069
3.744 35.0877 62000 3.5736
3.6408 36.2196 64000 3.5199
3.6408 37.3514 66000 3.4748
3.5553 38.4833 68000 3.4648
3.5553 39.6152 70000 3.4312
3.4864 40.7470 72000 3.4074
3.4864 41.8789 74000 3.3510
3.4224 43.0108 76000 3.3420
3.4224 44.1426 78000 3.3249
3.3729 45.2745 80000 3.3256
3.3729 46.4063 82000 3.2926
3.3305 47.5382 84000 3.2637
3.3305 48.6701 86000 3.2796
3.2928 49.8019 88000 3.2453
3.2928 50.9338 90000 3.2232
3.2636 52.0656 92000 3.2075
3.2636 53.1975 94000 3.2181
3.2402 54.3294 96000 3.2066
3.2402 55.4612 98000 3.1755
3.2256 56.5931 100000 3.1892

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
2
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support