eng_wiki_clm_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2540

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.1319 2000 7.5996
7.6844 2.2637 4000 6.6933
7.6844 3.3956 6000 6.2852
6.3411 4.5274 8000 6.0162
6.3411 5.6593 10000 5.7870
5.8299 6.7912 12000 5.5636
5.8299 7.9230 14000 5.3663
5.4226 9.0549 16000 5.1959
5.4226 10.1868 18000 5.0538
5.0963 11.3186 20000 4.9305
5.0963 12.4505 22000 4.8287
4.8579 13.5823 24000 4.7431
4.8579 14.7142 26000 4.6672
4.673 15.8461 28000 4.6054
4.673 16.9779 30000 4.5488
4.5218 18.1098 32000 4.5005
4.5218 19.2417 34000 4.4625
4.3942 20.3735 36000 4.4248
4.3942 21.5054 38000 4.3951
4.287 22.6372 40000 4.3714
4.287 23.7691 42000 4.3377
4.1875 24.9010 44000 4.3180
4.1875 26.0328 46000 4.3037
4.088 27.1647 48000 4.2899
4.088 28.2965 50000 4.2819
4.0097 29.4284 52000 4.2699
4.0097 30.5603 54000 4.2628
3.9437 31.6921 56000 4.2588
3.9437 32.8240 58000 4.2509
3.8877 33.9559 60000 4.2439
3.8877 35.0877 62000 4.2492
3.8319 36.2196 64000 4.2496
3.8319 37.3514 66000 4.2485
3.7878 38.4833 68000 4.2479
3.7878 39.6152 70000 4.2462
3.7485 40.7470 72000 4.2456
3.7485 41.8789 74000 4.2438
3.7129 43.0108 76000 4.2458
3.7129 44.1426 78000 4.2496
3.6752 45.2745 80000 4.2527
3.6752 46.4063 82000 4.2543
3.6467 47.5382 84000 4.2530
3.6467 48.6701 86000 4.2522
3.6209 49.8019 88000 4.2534
3.6209 50.9338 90000 4.2521
3.5947 52.0656 92000 4.2541
3.5947 53.1975 94000 4.2551
3.572 54.3294 96000 4.2561
3.572 55.4612 98000 4.2545
3.5535 56.5931 100000 4.2540

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
11
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support