You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

moe_tur_multi_batch_8

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 5.9074

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 38244
  • training_steps: 382446
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 10.9789
7.4445 0.2615 10000 7.3980
5.6546 0.5229 20000 5.6101
5.036 0.7844 30000 5.0138
4.6238 1.0459 40000 4.6896
4.445 1.3074 50000 4.4645
4.2684 1.5689 60000 4.3137
4.2097 1.8303 70000 4.2050
3.9034 2.0918 80000 4.1332
3.8426 2.3533 90000 4.0845
3.8379 2.6148 100000 4.0366
3.8142 2.8763 110000 3.9902
3.4415 3.1377 120000 4.0038
3.5026 3.3992 130000 3.9916
3.5018 3.6607 140000 3.9655
3.5207 3.9222 150000 3.9371
3.5128 4.0000 152976 3.9287
2.9641 4.1837 160000 4.0548
3.038 4.4451 170000 4.0626
3.1051 4.7066 180000 4.0439
3.1606 4.9681 190000 4.0237
2.4678 5.2296 200000 4.2661
2.5802 5.4910 210000 4.2925
2.6698 5.7525 220000 4.2789
1.8528 6.0140 230000 4.3869
1.9577 6.2755 240000 4.6120
2.0102 6.5370 250000 4.6567
2.097 6.7984 260000 4.6590
2.1054 7.0000 267708 4.6630
1.3511 7.0599 270000 4.8671
1.3987 7.3214 280000 5.0544
1.5008 7.5829 290000 5.1165
1.5377 7.8444 300000 5.1459
0.9324 8.1058 310000 5.3730
0.9891 8.3673 320000 5.5069
1.0269 8.6288 330000 5.5789
1.0243 8.8903 340000 5.6237
0.6486 9.1518 350000 5.7925
0.6572 9.4132 360000 5.8604
0.6697 9.6747 370000 5.8925
0.6738 9.9362 380000 5.9074

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results