fr_wiki_clm_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4778

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.7190 2000 7.1457
7.2132 3.4379 4000 5.8791
7.2132 5.1569 6000 5.4286
5.4631 6.8758 8000 5.0668
5.4631 8.5948 10000 4.7749
4.8227 10.3137 12000 4.5301
4.8227 12.0327 14000 4.3224
4.3686 13.7516 16000 4.1502
4.3686 15.4706 18000 4.0104
4.0366 17.1895 20000 3.8886
4.0366 18.9085 22000 3.7868
3.7884 20.6274 24000 3.7020
3.7884 22.3464 26000 3.6314
3.5938 24.0653 28000 3.5711
3.5938 25.7843 30000 3.5208
3.4315 27.5032 32000 3.4741
3.4315 29.2222 34000 3.4476
3.2967 30.9411 36000 3.4107
3.2967 32.6601 38000 3.3920
3.173 34.3790 40000 3.3799
3.173 36.0980 42000 3.3570
3.059 37.8169 44000 3.3407
3.059 39.5359 46000 3.3346
2.9456 41.2548 48000 3.3449
2.9456 42.9738 50000 3.3278
2.8533 44.6927 52000 3.3348
2.8533 46.4117 54000 3.3441
2.7722 48.1306 56000 3.3519
2.7722 49.8496 58000 3.3496
2.699 51.5685 60000 3.3583
2.699 53.2875 62000 3.3731
2.6407 55.0064 64000 3.3695
2.6407 56.7254 66000 3.3794
2.5791 58.4443 68000 3.3964
2.5791 60.1882 70000 3.4037
2.5335 61.9072 72000 3.3997
2.5335 63.6261 74000 3.4114
2.4847 65.3451 76000 3.4238
2.4847 67.0640 78000 3.4284
2.4463 68.7830 80000 3.4358
2.4463 70.5019 82000 3.4461
2.4082 72.2209 84000 3.4518
2.4082 73.9398 86000 3.4537
2.3753 75.6588 88000 3.4625
2.3753 77.3777 90000 3.4673
2.346 79.0967 92000 3.4705
2.346 80.8156 94000 3.4725
2.3184 82.5346 96000 3.4756
2.3184 84.2535 98000 3.4784
2.2992 85.9725 100000 3.4778

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
12.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support