got2moe_hom
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.1891
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 7134
- training_steps: 71348
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.5956 | 0.0701 | 500 | 8.5904 |
| 7.8215 | 0.1402 | 1000 | 7.4205 |
| 7.4004 | 0.2102 | 1500 | 6.9757 |
| 7.0153 | 0.2803 | 2000 | 6.6896 |
| 6.8328 | 0.3504 | 2500 | 6.4711 |
| 6.565 | 0.4205 | 3000 | 6.2419 |
| 6.4087 | 0.4905 | 3500 | 6.0579 |
| 6.2179 | 0.5606 | 4000 | 5.8909 |
| 6.1129 | 0.6307 | 4500 | 5.7518 |
| 5.9624 | 0.7008 | 5000 | 5.6311 |
| 5.8645 | 0.7708 | 5500 | 5.5181 |
| 5.7498 | 0.8409 | 6000 | 5.4256 |
| 5.6873 | 0.9110 | 6500 | 5.3456 |
| 5.5901 | 0.9811 | 7000 | 5.2627 |
| 5.5069 | 1.0512 | 7500 | 5.1970 |
| 5.4276 | 1.1212 | 8000 | 5.1286 |
| 5.3877 | 1.1913 | 8500 | 5.0679 |
| 5.3149 | 1.2614 | 9000 | 5.0113 |
| 5.2861 | 1.3315 | 9500 | 4.9615 |
| 5.2339 | 1.4015 | 10000 | 4.9193 |
| 5.1977 | 1.4716 | 10500 | 4.8754 |
| 5.1546 | 1.5417 | 11000 | 4.8416 |
| 5.1338 | 1.6118 | 11500 | 4.8051 |
| 5.0935 | 1.6819 | 12000 | 4.7739 |
| 5.0677 | 1.7519 | 12500 | 4.7476 |
| 5.0371 | 1.8220 | 13000 | 4.7161 |
| 5.0184 | 1.8921 | 13500 | 4.6926 |
| 4.9905 | 1.9622 | 14000 | 4.6700 |
| 4.9217 | 2.0322 | 14500 | 4.6521 |
| 4.8863 | 2.1023 | 15000 | 4.6340 |
| 4.8754 | 2.1724 | 15500 | 4.6158 |
| 4.8584 | 2.2425 | 16000 | 4.6032 |
| 4.8549 | 2.3125 | 16500 | 4.5852 |
| 4.8408 | 2.3826 | 17000 | 4.5723 |
| 4.8275 | 2.4527 | 17500 | 4.5569 |
| 4.8131 | 2.5228 | 18000 | 4.5422 |
| 4.8017 | 2.5929 | 18500 | 4.5277 |
| 4.7985 | 2.6629 | 19000 | 4.5154 |
| 4.7807 | 2.7330 | 19500 | 4.5034 |
| 4.7726 | 2.8031 | 20000 | 4.4918 |
| 4.7599 | 2.8732 | 20500 | 4.4813 |
| 4.7502 | 2.9432 | 21000 | 4.4699 |
| 4.7402 | 3.0133 | 21500 | 4.4597 |
| 4.643 | 3.0834 | 22000 | 4.4537 |
| 4.6432 | 3.1535 | 22500 | 4.4508 |
| 4.6455 | 3.2235 | 23000 | 4.4412 |
| 4.6387 | 3.2936 | 23500 | 4.4293 |
| 4.6314 | 3.3637 | 24000 | 4.4235 |
| 4.634 | 3.4338 | 24500 | 4.4138 |
| 4.6294 | 3.5039 | 25000 | 4.4057 |
| 4.6213 | 3.5739 | 25500 | 4.3970 |
| 4.6159 | 3.6440 | 26000 | 4.3916 |
| 4.6081 | 3.7141 | 26500 | 4.3841 |
| 4.6129 | 3.7842 | 27000 | 4.3791 |
| 4.6055 | 3.8542 | 27500 | 4.3691 |
| 4.6032 | 3.9243 | 28000 | 4.3643 |
| 4.6033 | 3.9944 | 28500 | 4.3571 |
| 4.4889 | 4.0645 | 29000 | 4.3574 |
| 4.4925 | 4.1345 | 29500 | 4.3557 |
| 4.4918 | 4.2046 | 30000 | 4.3502 |
| 4.4998 | 4.2747 | 30500 | 4.3441 |
| 4.5001 | 4.3448 | 31000 | 4.3378 |
| 4.4951 | 4.4149 | 31500 | 4.3338 |
| 4.5046 | 4.4849 | 32000 | 4.3280 |
| 4.4966 | 4.5550 | 32500 | 4.3234 |
| 4.4914 | 4.6251 | 33000 | 4.3166 |
| 4.497 | 4.6952 | 33500 | 4.3118 |
| 4.4898 | 4.7652 | 34000 | 4.3076 |
| 4.4917 | 4.8353 | 34500 | 4.3013 |
| 4.4848 | 4.9054 | 35000 | 4.2983 |
| 4.4767 | 4.9755 | 35500 | 4.2911 |
| 4.3754 | 5.0456 | 36000 | 4.2952 |
| 4.3786 | 5.1156 | 36500 | 4.2968 |
| 4.3894 | 5.1857 | 37000 | 4.2903 |
| 4.3938 | 5.2558 | 37500 | 4.2889 |
| 4.3913 | 5.3259 | 38000 | 4.2854 |
| 4.3917 | 5.3959 | 38500 | 4.2816 |
| 4.3989 | 5.4660 | 39000 | 4.2783 |
| 4.3954 | 5.5361 | 39500 | 4.2729 |
| 4.3912 | 5.6062 | 40000 | 4.2690 |
| 4.3914 | 5.6762 | 40500 | 4.2658 |
| 4.3946 | 5.7463 | 41000 | 4.2627 |
| 4.3944 | 5.8164 | 41500 | 4.2597 |
| 4.3928 | 5.8865 | 42000 | 4.2552 |
| 4.3943 | 5.9566 | 42500 | 4.2494 |
| 4.2916 | 6.0266 | 43000 | 4.2536 |
| 4.2894 | 6.0967 | 43500 | 4.2542 |
| 4.3032 | 6.1668 | 44000 | 4.2535 |
| 4.2958 | 6.2369 | 44500 | 4.2523 |
| 4.3069 | 6.3069 | 45000 | 4.2484 |
| 4.3041 | 6.3770 | 45500 | 4.2456 |
| 4.3109 | 6.4471 | 46000 | 4.2441 |
| 4.3082 | 6.5172 | 46500 | 4.2388 |
| 4.313 | 6.5872 | 47000 | 4.2368 |
| 4.3096 | 6.6573 | 47500 | 4.2354 |
| 4.308 | 6.7274 | 48000 | 4.2310 |
| 4.3171 | 6.7975 | 48500 | 4.2294 |
| 4.3076 | 6.8676 | 49000 | 4.2266 |
| 4.3124 | 6.9376 | 49500 | 4.2230 |
| 4.2844 | 7.0077 | 50000 | 4.2251 |
| 4.2169 | 7.0778 | 50500 | 4.2280 |
| 4.2259 | 7.1479 | 51000 | 4.2269 |
| 4.2361 | 7.2179 | 51500 | 4.2260 |
| 4.2346 | 7.2880 | 52000 | 4.2256 |
| 4.239 | 7.3581 | 52500 | 4.2226 |
| 4.2311 | 7.4282 | 53000 | 4.2207 |
| 4.2388 | 7.4982 | 53500 | 4.2179 |
| 4.2419 | 7.5683 | 54000 | 4.2154 |
| 4.2382 | 7.6384 | 54500 | 4.2134 |
| 4.2423 | 7.7085 | 55000 | 4.2116 |
| 4.2431 | 7.7786 | 55500 | 4.2095 |
| 4.2408 | 7.8486 | 56000 | 4.2079 |
| 4.2395 | 7.9187 | 56500 | 4.2047 |
| 4.2388 | 7.9888 | 57000 | 4.2015 |
| 4.1639 | 8.0589 | 57500 | 4.2092 |
| 4.1639 | 8.1289 | 58000 | 4.2086 |
| 4.1632 | 8.1990 | 58500 | 4.2096 |
| 4.1755 | 8.2691 | 59000 | 4.2082 |
| 4.1757 | 8.3392 | 59500 | 4.2063 |
| 4.1752 | 8.4093 | 60000 | 4.2057 |
| 4.1728 | 8.4793 | 60500 | 4.2040 |
| 4.1785 | 8.5494 | 61000 | 4.2019 |
| 4.1757 | 8.6195 | 61500 | 4.2012 |
| 4.1777 | 8.6896 | 62000 | 4.1982 |
| 4.1756 | 8.7596 | 62500 | 4.1981 |
| 4.1758 | 8.8297 | 63000 | 4.1947 |
| 4.1779 | 8.8998 | 63500 | 4.1939 |
| 4.1792 | 8.9699 | 64000 | 4.1933 |
| 4.1199 | 9.0399 | 64500 | 4.1965 |
| 4.1148 | 9.1100 | 65000 | 4.1972 |
| 4.117 | 9.1801 | 65500 | 4.1974 |
| 4.1223 | 9.2502 | 66000 | 4.1956 |
| 4.1232 | 9.3203 | 66500 | 4.1951 |
| 4.1212 | 9.3903 | 67000 | 4.1941 |
| 4.124 | 9.4604 | 67500 | 4.1937 |
| 4.1222 | 9.5305 | 68000 | 4.1927 |
| 4.1263 | 9.6006 | 68500 | 4.1920 |
| 4.1259 | 9.6706 | 69000 | 4.1913 |
| 4.1209 | 9.7407 | 69500 | 4.1904 |
| 4.1185 | 9.8108 | 70000 | 4.1901 |
| 4.1238 | 9.8809 | 70500 | 4.1899 |
| 4.1179 | 9.9509 | 71000 | 4.1892 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -