You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

got2moe_hom

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 7134
  • training_steps: 71348
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.5956 0.0701 500 8.5904
7.8215 0.1402 1000 7.4205
7.4004 0.2102 1500 6.9757
7.0153 0.2803 2000 6.6896
6.8328 0.3504 2500 6.4711
6.565 0.4205 3000 6.2419
6.4087 0.4905 3500 6.0579
6.2179 0.5606 4000 5.8909
6.1129 0.6307 4500 5.7518
5.9624 0.7008 5000 5.6311
5.8645 0.7708 5500 5.5181
5.7498 0.8409 6000 5.4256
5.6873 0.9110 6500 5.3456
5.5901 0.9811 7000 5.2627
5.5069 1.0512 7500 5.1970
5.4276 1.1212 8000 5.1286
5.3877 1.1913 8500 5.0679
5.3149 1.2614 9000 5.0113
5.2861 1.3315 9500 4.9615
5.2339 1.4015 10000 4.9193
5.1977 1.4716 10500 4.8754
5.1546 1.5417 11000 4.8416
5.1338 1.6118 11500 4.8051
5.0935 1.6819 12000 4.7739
5.0677 1.7519 12500 4.7476
5.0371 1.8220 13000 4.7161
5.0184 1.8921 13500 4.6926
4.9905 1.9622 14000 4.6700
4.9217 2.0322 14500 4.6521
4.8863 2.1023 15000 4.6340
4.8754 2.1724 15500 4.6158
4.8584 2.2425 16000 4.6032
4.8549 2.3125 16500 4.5852
4.8408 2.3826 17000 4.5723
4.8275 2.4527 17500 4.5569
4.8131 2.5228 18000 4.5422
4.8017 2.5929 18500 4.5277
4.7985 2.6629 19000 4.5154
4.7807 2.7330 19500 4.5034
4.7726 2.8031 20000 4.4918
4.7599 2.8732 20500 4.4813
4.7502 2.9432 21000 4.4699
4.7402 3.0133 21500 4.4597
4.643 3.0834 22000 4.4537
4.6432 3.1535 22500 4.4508
4.6455 3.2235 23000 4.4412
4.6387 3.2936 23500 4.4293
4.6314 3.3637 24000 4.4235
4.634 3.4338 24500 4.4138
4.6294 3.5039 25000 4.4057
4.6213 3.5739 25500 4.3970
4.6159 3.6440 26000 4.3916
4.6081 3.7141 26500 4.3841
4.6129 3.7842 27000 4.3791
4.6055 3.8542 27500 4.3691
4.6032 3.9243 28000 4.3643
4.6033 3.9944 28500 4.3571
4.4889 4.0645 29000 4.3574
4.4925 4.1345 29500 4.3557
4.4918 4.2046 30000 4.3502
4.4998 4.2747 30500 4.3441
4.5001 4.3448 31000 4.3378
4.4951 4.4149 31500 4.3338
4.5046 4.4849 32000 4.3280
4.4966 4.5550 32500 4.3234
4.4914 4.6251 33000 4.3166
4.497 4.6952 33500 4.3118
4.4898 4.7652 34000 4.3076
4.4917 4.8353 34500 4.3013
4.4848 4.9054 35000 4.2983
4.4767 4.9755 35500 4.2911
4.3754 5.0456 36000 4.2952
4.3786 5.1156 36500 4.2968
4.3894 5.1857 37000 4.2903
4.3938 5.2558 37500 4.2889
4.3913 5.3259 38000 4.2854
4.3917 5.3959 38500 4.2816
4.3989 5.4660 39000 4.2783
4.3954 5.5361 39500 4.2729
4.3912 5.6062 40000 4.2690
4.3914 5.6762 40500 4.2658
4.3946 5.7463 41000 4.2627
4.3944 5.8164 41500 4.2597
4.3928 5.8865 42000 4.2552
4.3943 5.9566 42500 4.2494
4.2916 6.0266 43000 4.2536
4.2894 6.0967 43500 4.2542
4.3032 6.1668 44000 4.2535
4.2958 6.2369 44500 4.2523
4.3069 6.3069 45000 4.2484
4.3041 6.3770 45500 4.2456
4.3109 6.4471 46000 4.2441
4.3082 6.5172 46500 4.2388
4.313 6.5872 47000 4.2368
4.3096 6.6573 47500 4.2354
4.308 6.7274 48000 4.2310
4.3171 6.7975 48500 4.2294
4.3076 6.8676 49000 4.2266
4.3124 6.9376 49500 4.2230
4.2844 7.0077 50000 4.2251
4.2169 7.0778 50500 4.2280
4.2259 7.1479 51000 4.2269
4.2361 7.2179 51500 4.2260
4.2346 7.2880 52000 4.2256
4.239 7.3581 52500 4.2226
4.2311 7.4282 53000 4.2207
4.2388 7.4982 53500 4.2179
4.2419 7.5683 54000 4.2154
4.2382 7.6384 54500 4.2134
4.2423 7.7085 55000 4.2116
4.2431 7.7786 55500 4.2095
4.2408 7.8486 56000 4.2079
4.2395 7.9187 56500 4.2047
4.2388 7.9888 57000 4.2015
4.1639 8.0589 57500 4.2092
4.1639 8.1289 58000 4.2086
4.1632 8.1990 58500 4.2096
4.1755 8.2691 59000 4.2082
4.1757 8.3392 59500 4.2063
4.1752 8.4093 60000 4.2057
4.1728 8.4793 60500 4.2040
4.1785 8.5494 61000 4.2019
4.1757 8.6195 61500 4.2012
4.1777 8.6896 62000 4.1982
4.1756 8.7596 62500 4.1981
4.1758 8.8297 63000 4.1947
4.1779 8.8998 63500 4.1939
4.1792 8.9699 64000 4.1933
4.1199 9.0399 64500 4.1965
4.1148 9.1100 65000 4.1972
4.117 9.1801 65500 4.1974
4.1223 9.2502 66000 4.1956
4.1232 9.3203 66500 4.1951
4.1212 9.3903 67000 4.1941
4.124 9.4604 67500 4.1937
4.1222 9.5305 68000 4.1927
4.1263 9.6006 68500 4.1920
4.1259 9.6706 69000 4.1913
4.1209 9.7407 69500 4.1904
4.1185 9.8108 70000 4.1901
4.1238 9.8809 70500 4.1899
4.1179 9.9509 71000 4.1892

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support