en_wiki_mlm_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2073

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.1319 2000 7.9220
7.9468 2.2637 4000 7.1138
7.9468 3.3956 6000 7.0179
7.0274 4.5274 8000 6.9387
7.0274 5.6593 10000 6.8684
6.8824 6.7912 12000 6.8074
6.8824 7.9230 14000 6.7360
6.7613 9.0549 16000 6.6897
6.7613 10.1868 18000 6.6394
6.6553 11.3186 20000 6.5982
6.6553 12.4505 22000 6.5549
6.5571 13.5823 24000 6.4910
6.5571 14.7142 26000 6.3365
6.3693 15.8461 28000 6.1672
6.3693 16.9779 30000 6.0045
6.0899 18.1098 32000 5.7855
6.0899 19.2417 34000 5.4393
5.5439 20.3735 36000 4.9515
5.5439 21.5054 38000 4.7547
4.8683 22.6372 40000 4.5845
4.8683 23.7691 42000 4.4155
4.5176 24.9010 44000 4.2623
4.5176 26.0328 46000 4.1626
4.2542 27.1647 48000 4.0574
4.2542 28.2965 50000 3.9692
4.0419 29.4284 52000 3.8587
4.0419 30.5603 54000 3.7976
3.886 31.6921 56000 3.7284
3.886 32.8240 58000 3.6753
3.7574 33.9559 60000 3.6361
3.7574 35.0877 62000 3.5934
3.6518 36.2196 64000 3.5501
3.6518 37.3514 66000 3.5198
3.5686 38.4833 68000 3.4513
3.5686 39.6152 70000 3.4401
3.4978 40.7470 72000 3.4219
3.4978 41.8789 74000 3.3757
3.4364 43.0108 76000 3.3725
3.4364 44.1426 78000 3.3441
3.3897 45.2745 80000 3.3154
3.3897 46.4063 82000 3.3061
3.3414 47.5382 84000 3.2805
3.3414 48.6701 86000 3.2789
3.3082 49.8019 88000 3.2435
3.3082 50.9338 90000 3.2386
3.2764 52.0656 92000 3.2367
3.2764 53.1975 94000 3.2309
3.261 54.3294 96000 3.2278
3.261 55.4612 98000 3.2301
3.2384 56.5931 100000 3.2073

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support