got2moe_het
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.2492
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 7394
- training_steps: 73945
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.906 | 0.0676 | 500 | 9.0388 |
| 8.2268 | 0.1352 | 1000 | 7.8233 |
| 7.7262 | 0.2028 | 1500 | 7.3070 |
| 7.2574 | 0.2705 | 2000 | 6.9448 |
| 7.0245 | 0.3381 | 2500 | 6.6707 |
| 6.7548 | 0.4057 | 3000 | 6.4530 |
| 6.5834 | 0.4733 | 3500 | 6.2479 |
| 6.3731 | 0.5409 | 4000 | 6.0745 |
| 6.2479 | 0.6085 | 4500 | 5.9235 |
| 6.0907 | 0.6762 | 5000 | 5.7935 |
| 5.9807 | 0.7438 | 5500 | 5.6789 |
| 5.8692 | 0.8114 | 6000 | 5.5788 |
| 5.7971 | 0.8790 | 6500 | 5.4889 |
| 5.7022 | 0.9466 | 7000 | 5.4077 |
| 5.6425 | 1.0142 | 7500 | 5.3308 |
| 5.525 | 1.0818 | 8000 | 5.2571 |
| 5.4807 | 1.1494 | 8500 | 5.1977 |
| 5.4081 | 1.2170 | 9000 | 5.1273 |
| 5.3655 | 1.2847 | 9500 | 5.0732 |
| 5.3077 | 1.3523 | 10000 | 5.0248 |
| 5.2738 | 1.4199 | 10500 | 4.9784 |
| 5.2261 | 1.4875 | 11000 | 4.9376 |
| 5.1893 | 1.5551 | 11500 | 4.8985 |
| 5.1529 | 1.6227 | 12000 | 4.8649 |
| 5.1249 | 1.6904 | 12500 | 4.8293 |
| 5.0992 | 1.7580 | 13000 | 4.8066 |
| 5.0853 | 1.8256 | 13500 | 4.7814 |
| 5.0433 | 1.8932 | 14000 | 4.7564 |
| 5.0274 | 1.9608 | 14500 | 4.7360 |
| 4.9209 | 2.0284 | 15000 | 4.7142 |
| 4.9173 | 2.0960 | 15500 | 4.6988 |
| 4.9018 | 2.1636 | 16000 | 4.6819 |
| 4.8945 | 2.2312 | 16500 | 4.6697 |
| 4.8814 | 2.2989 | 17000 | 4.6501 |
| 4.8839 | 2.3665 | 17500 | 4.6353 |
| 4.865 | 2.4341 | 18000 | 4.6216 |
| 4.8498 | 2.5017 | 18500 | 4.6099 |
| 4.8424 | 2.5693 | 19000 | 4.5958 |
| 4.8327 | 2.6369 | 19500 | 4.5840 |
| 4.82 | 2.7046 | 20000 | 4.5716 |
| 4.8123 | 2.7722 | 20500 | 4.5587 |
| 4.8072 | 2.8398 | 21000 | 4.5486 |
| 4.8027 | 2.9074 | 21500 | 4.5397 |
| 4.7798 | 2.9750 | 22000 | 4.5269 |
| 4.6741 | 3.0426 | 22500 | 4.5236 |
| 4.684 | 3.1102 | 23000 | 4.5160 |
| 4.6748 | 3.1778 | 23500 | 4.5064 |
| 4.6718 | 3.2454 | 24000 | 4.4996 |
| 4.6814 | 3.3131 | 24500 | 4.4942 |
| 4.6675 | 3.3807 | 25000 | 4.4853 |
| 4.6712 | 3.4483 | 25500 | 4.4761 |
| 4.6707 | 3.5159 | 26000 | 4.4693 |
| 4.6592 | 3.5835 | 26500 | 4.4625 |
| 4.657 | 3.6511 | 27000 | 4.4544 |
| 4.6564 | 3.7188 | 27500 | 4.4477 |
| 4.6467 | 3.7864 | 28000 | 4.4433 |
| 4.6392 | 3.8540 | 28500 | 4.4365 |
| 4.6454 | 3.9216 | 29000 | 4.4275 |
| 4.6348 | 3.9892 | 29500 | 4.4216 |
| 4.5243 | 4.0568 | 30000 | 4.4208 |
| 4.5291 | 4.1244 | 30500 | 4.4183 |
| 4.536 | 4.1920 | 31000 | 4.4142 |
| 4.5324 | 4.2596 | 31500 | 4.4101 |
| 4.5404 | 4.3273 | 32000 | 4.4041 |
| 4.5357 | 4.3949 | 32500 | 4.4000 |
| 4.5464 | 4.4625 | 33000 | 4.3940 |
| 4.5331 | 4.5301 | 33500 | 4.3901 |
| 4.53 | 4.5977 | 34000 | 4.3856 |
| 4.5346 | 4.6653 | 34500 | 4.3782 |
| 4.5343 | 4.7330 | 35000 | 4.3735 |
| 4.5292 | 4.8006 | 35500 | 4.3692 |
| 4.5316 | 4.8682 | 36000 | 4.3656 |
| 4.5242 | 4.9358 | 36500 | 4.3626 |
| 4.5093 | 5.0034 | 37000 | 4.3583 |
| 4.4211 | 5.0710 | 37500 | 4.3590 |
| 4.424 | 5.1386 | 38000 | 4.3583 |
| 4.4287 | 5.2062 | 38500 | 4.3555 |
| 4.4338 | 5.2738 | 39000 | 4.3510 |
| 4.4322 | 5.3415 | 39500 | 4.3491 |
| 4.4351 | 5.4091 | 40000 | 4.3448 |
| 4.4396 | 5.4767 | 40500 | 4.3419 |
| 4.4378 | 5.5443 | 41000 | 4.3379 |
| 4.4311 | 5.6119 | 41500 | 4.3344 |
| 4.4389 | 5.6795 | 42000 | 4.3307 |
| 4.436 | 5.7472 | 42500 | 4.3249 |
| 4.436 | 5.8148 | 43000 | 4.3222 |
| 4.4421 | 5.8824 | 43500 | 4.3183 |
| 4.4344 | 5.9500 | 44000 | 4.3166 |
| 4.4157 | 6.0176 | 44500 | 4.3179 |
| 4.3316 | 6.0852 | 45000 | 4.3199 |
| 4.3371 | 6.1528 | 45500 | 4.3172 |
| 4.3419 | 6.2204 | 46000 | 4.3161 |
| 4.3457 | 6.2880 | 46500 | 4.3153 |
| 4.3476 | 6.3557 | 47000 | 4.3114 |
| 4.3527 | 6.4233 | 47500 | 4.3085 |
| 4.3541 | 6.4909 | 48000 | 4.3049 |
| 4.3394 | 6.5585 | 48500 | 4.3029 |
| 4.3612 | 6.6261 | 49000 | 4.2986 |
| 4.3557 | 6.6937 | 49500 | 4.2969 |
| 4.3522 | 6.7614 | 50000 | 4.2946 |
| 4.359 | 6.8290 | 50500 | 4.2904 |
| 4.3576 | 6.8966 | 51000 | 4.2877 |
| 4.3567 | 6.9642 | 51500 | 4.2849 |
| 4.2488 | 7.0318 | 52000 | 4.2899 |
| 4.268 | 7.0994 | 52500 | 4.2919 |
| 4.2704 | 7.1670 | 53000 | 4.2901 |
| 4.2749 | 7.2346 | 53500 | 4.2886 |
| 4.2732 | 7.3022 | 54000 | 4.2873 |
| 4.2811 | 7.3699 | 54500 | 4.2841 |
| 4.286 | 7.4375 | 55000 | 4.2824 |
| 4.2827 | 7.5051 | 55500 | 4.2801 |
| 4.2873 | 7.5727 | 56000 | 4.2789 |
| 4.2855 | 7.6403 | 56500 | 4.2766 |
| 4.284 | 7.7079 | 57000 | 4.2738 |
| 4.2803 | 7.7756 | 57500 | 4.2710 |
| 4.2853 | 7.8432 | 58000 | 4.2699 |
| 4.2821 | 7.9108 | 58500 | 4.2666 |
| 4.2852 | 7.9784 | 59000 | 4.2652 |
| 4.2 | 8.0460 | 59500 | 4.2704 |
| 4.2042 | 8.1136 | 60000 | 4.2724 |
| 4.207 | 8.1812 | 60500 | 4.2705 |
| 4.2166 | 8.2488 | 61000 | 4.2695 |
| 4.2104 | 8.3164 | 61500 | 4.2689 |
| 4.2195 | 8.3841 | 62000 | 4.2679 |
| 4.2242 | 8.4517 | 62500 | 4.2663 |
| 4.2157 | 8.5193 | 63000 | 4.2637 |
| 4.2197 | 8.5869 | 63500 | 4.2624 |
| 4.2222 | 8.6545 | 64000 | 4.2612 |
| 4.2236 | 8.7221 | 64500 | 4.2591 |
| 4.2196 | 8.7897 | 65000 | 4.2575 |
| 4.2177 | 8.8574 | 65500 | 4.2561 |
| 4.2216 | 8.9250 | 66000 | 4.2541 |
| 4.2241 | 8.9926 | 66500 | 4.2524 |
| 4.1643 | 9.0602 | 67000 | 4.2578 |
| 4.1584 | 9.1278 | 67500 | 4.2577 |
| 4.1627 | 9.1954 | 68000 | 4.2572 |
| 4.1618 | 9.2630 | 68500 | 4.2566 |
| 4.1674 | 9.3306 | 69000 | 4.2558 |
| 4.1681 | 9.3983 | 69500 | 4.2548 |
| 4.1668 | 9.4659 | 70000 | 4.2537 |
| 4.1703 | 9.5335 | 70500 | 4.2535 |
| 4.1643 | 9.6011 | 71000 | 4.2523 |
| 4.1639 | 9.6687 | 71500 | 4.2513 |
| 4.1687 | 9.7363 | 72000 | 4.2509 |
| 4.1641 | 9.8039 | 72500 | 4.2499 |
| 4.1648 | 9.8716 | 73000 | 4.2496 |
| 4.1634 | 9.9392 | 73500 | 4.2493 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.7.1+cu118
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- 13