You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

moe_g_smaller

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1824

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 48952
  • training_steps: 489524
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 10.9822
7.7303 0.2043 10000 7.6901
6.4004 0.4086 20000 6.3512
5.6444 0.6128 30000 5.6093
5.2576 0.8171 40000 5.2313
4.9578 1.0214 50000 5.0023
4.8172 1.2257 60000 4.8182
4.704 1.4299 70000 4.6885
4.6026 1.6342 80000 4.5951
4.539 1.8385 90000 4.5199
4.3395 2.0428 100000 4.4624
4.3334 2.2471 110000 4.4249
4.3147 2.4513 120000 4.3870
4.2981 2.6556 130000 4.3512
4.2773 2.8599 140000 4.3192
4.0874 3.0642 150000 4.3020
4.108 3.2684 160000 4.2842
4.1255 3.4727 170000 4.2625
4.1187 3.6770 180000 4.2417
4.1067 3.8813 190000 4.2223
3.9283 4.0856 200000 4.2262
3.9779 4.2898 210000 4.2159
3.9489 4.4941 220000 4.2002
3.9762 4.6984 230000 4.1840
3.9812 4.9027 240000 4.1695
3.9538 5.0 244765 4.1619
3.8109 5.1070 250000 4.1865
3.8151 5.3113 260000 4.1802
3.8114 5.5156 270000 4.1682
3.8476 5.7199 280000 4.1554
3.8423 5.9242 290000 4.1430
3.6525 6.1285 300000 4.1762
3.6778 6.3327 310000 4.1686
3.7166 6.5370 320000 4.1582
3.7423 6.7413 330000 4.1454
3.7259 6.9456 340000 4.1333
3.5657 7.1498 350000 4.1738
3.5815 7.3541 360000 4.1683
3.615 7.5584 370000 4.1591
3.6129 7.7627 380000 4.1502
3.6105 7.9670 390000 4.1394
3.4408 8.1712 400000 4.1830
3.4751 8.3755 410000 4.1787
3.465 8.5798 420000 4.1730
3.4814 8.7841 430000 4.1652
3.4932 8.9883 440000 4.1580
3.3643 9.1926 450000 4.1924
3.3672 9.3969 460000 4.1906
3.3694 9.6012 470000 4.1869
3.3765 9.8055 480000 4.1840

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results