You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

moe_g

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 48952
  • training_steps: 489524
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 10.9795
7.5759 0.2043 10000 7.5365
6.0757 0.4086 20000 6.0247
5.3575 0.6128 30000 5.3397
5.0644 0.8171 40000 5.0423
4.7435 1.0214 50000 4.8210
4.6448 1.2257 60000 4.6657
4.551 1.4299 70000 4.5522
4.4603 1.6342 80000 4.4658
4.4004 1.8385 90000 4.3951
4.135 2.0428 100000 4.3416
4.1413 2.2471 110000 4.3071
4.1372 2.4513 120000 4.2719
4.1268 2.6556 130000 4.2369
4.11 2.8599 140000 4.2044
3.8058 3.0642 150000 4.2044
3.8508 3.2684 160000 4.1921
3.8827 3.4727 170000 4.1721
3.8842 3.6770 180000 4.1516
3.8795 3.8813 190000 4.1302
3.5506 4.0856 200000 4.1772
3.6215 4.2898 210000 4.1755
3.6136 4.4941 220000 4.1624
3.6512 4.6984 230000 4.1430
3.6687 4.9027 240000 4.1261
3.2695 5.1069 250000 4.2228
3.3309 5.3112 260000 4.2313
3.3674 5.5155 270000 4.2179
3.4091 5.7198 280000 4.2003
3.4325 5.9241 290000 4.1842
2.9939 6.1283 300000 4.3241
3.0435 6.3326 310000 4.3383
3.1034 6.5369 320000 4.3293
3.1355 6.7412 330000 4.3161
3.134 6.9454 340000 4.3015
2.7404 7.1497 350000 4.4597
2.7835 7.3540 360000 4.4807
2.8187 7.5583 370000 4.4815
2.8291 7.7626 380000 4.4772
2.8283 7.9668 390000 4.4730
2.4822 8.1711 400000 4.6015
2.512 8.3754 410000 4.6191
2.5151 8.5797 420000 4.6252
2.5148 8.7839 430000 4.6268
2.5285 8.9882 440000 4.6275
2.4292 9.1927 450000 4.7047
2.4565 9.3970 460000 4.7123
2.4553 9.6012 470000 4.7143
2.4479 9.8055 480000 4.7146

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results