eng_wiki_clm_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.1319 2000 7.5861
7.672 2.2637 4000 6.6896
7.672 3.3956 6000 6.2889
6.3419 4.5274 8000 6.0171
6.3419 5.6593 10000 5.7843
5.8297 6.7912 12000 5.5648
5.8297 7.9230 14000 5.3653
5.4199 9.0549 16000 5.1962
5.4199 10.1868 18000 5.0526
5.0949 11.3186 20000 4.9287
5.0949 12.4505 22000 4.8257
4.8548 13.5823 24000 4.7442
4.8548 14.7142 26000 4.6666
4.6698 15.8461 28000 4.6017
4.6698 16.9779 30000 4.5493
4.5194 18.1098 32000 4.4992
4.5194 19.2417 34000 4.4609
4.3912 20.3735 36000 4.4266
4.3912 21.5054 38000 4.3924
4.2859 22.6372 40000 4.3671
4.2859 23.7691 42000 4.3373
4.1853 24.9010 44000 4.3159
4.1853 26.0328 46000 4.3016
4.0873 27.1647 48000 4.2892
4.0873 28.2965 50000 4.2784
4.0071 29.4284 52000 4.2694
4.0071 30.5603 54000 4.2630
3.9439 31.6921 56000 4.2547
3.9439 32.8240 58000 4.2468
3.8867 33.9559 60000 4.2447
3.8867 35.0877 62000 4.2471
3.83 36.2196 64000 4.2458
3.83 37.3514 66000 4.2433
3.787 38.4833 68000 4.2433
3.787 39.6152 70000 4.2433
3.7489 40.7470 72000 4.2429
3.7489 41.8789 74000 4.2410
3.7122 43.0108 76000 4.2438
3.7122 44.1426 78000 4.2496
3.6739 45.2745 80000 4.2494
3.6739 46.4063 82000 4.2477
3.6465 47.5382 84000 4.2488
3.6465 48.6701 86000 4.2504
3.6201 49.8019 88000 4.2490
3.6201 50.9338 90000 4.2500
3.5947 52.0656 92000 4.2516
3.5947 53.1975 94000 4.2530
3.5712 54.3294 96000 4.2526
3.5712 55.4612 98000 4.2529
3.5532 56.5931 100000 4.2516

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
4
Safetensors
Model size
12.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using fpadovani/en_wiki_clm_13_new 1