You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

got2moe_het

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2492

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 7394
  • training_steps: 73945
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.906 0.0676 500 9.0388
8.2268 0.1352 1000 7.8233
7.7262 0.2028 1500 7.3070
7.2574 0.2705 2000 6.9448
7.0245 0.3381 2500 6.6707
6.7548 0.4057 3000 6.4530
6.5834 0.4733 3500 6.2479
6.3731 0.5409 4000 6.0745
6.2479 0.6085 4500 5.9235
6.0907 0.6762 5000 5.7935
5.9807 0.7438 5500 5.6789
5.8692 0.8114 6000 5.5788
5.7971 0.8790 6500 5.4889
5.7022 0.9466 7000 5.4077
5.6425 1.0142 7500 5.3308
5.525 1.0818 8000 5.2571
5.4807 1.1494 8500 5.1977
5.4081 1.2170 9000 5.1273
5.3655 1.2847 9500 5.0732
5.3077 1.3523 10000 5.0248
5.2738 1.4199 10500 4.9784
5.2261 1.4875 11000 4.9376
5.1893 1.5551 11500 4.8985
5.1529 1.6227 12000 4.8649
5.1249 1.6904 12500 4.8293
5.0992 1.7580 13000 4.8066
5.0853 1.8256 13500 4.7814
5.0433 1.8932 14000 4.7564
5.0274 1.9608 14500 4.7360
4.9209 2.0284 15000 4.7142
4.9173 2.0960 15500 4.6988
4.9018 2.1636 16000 4.6819
4.8945 2.2312 16500 4.6697
4.8814 2.2989 17000 4.6501
4.8839 2.3665 17500 4.6353
4.865 2.4341 18000 4.6216
4.8498 2.5017 18500 4.6099
4.8424 2.5693 19000 4.5958
4.8327 2.6369 19500 4.5840
4.82 2.7046 20000 4.5716
4.8123 2.7722 20500 4.5587
4.8072 2.8398 21000 4.5486
4.8027 2.9074 21500 4.5397
4.7798 2.9750 22000 4.5269
4.6741 3.0426 22500 4.5236
4.684 3.1102 23000 4.5160
4.6748 3.1778 23500 4.5064
4.6718 3.2454 24000 4.4996
4.6814 3.3131 24500 4.4942
4.6675 3.3807 25000 4.4853
4.6712 3.4483 25500 4.4761
4.6707 3.5159 26000 4.4693
4.6592 3.5835 26500 4.4625
4.657 3.6511 27000 4.4544
4.6564 3.7188 27500 4.4477
4.6467 3.7864 28000 4.4433
4.6392 3.8540 28500 4.4365
4.6454 3.9216 29000 4.4275
4.6348 3.9892 29500 4.4216
4.5243 4.0568 30000 4.4208
4.5291 4.1244 30500 4.4183
4.536 4.1920 31000 4.4142
4.5324 4.2596 31500 4.4101
4.5404 4.3273 32000 4.4041
4.5357 4.3949 32500 4.4000
4.5464 4.4625 33000 4.3940
4.5331 4.5301 33500 4.3901
4.53 4.5977 34000 4.3856
4.5346 4.6653 34500 4.3782
4.5343 4.7330 35000 4.3735
4.5292 4.8006 35500 4.3692
4.5316 4.8682 36000 4.3656
4.5242 4.9358 36500 4.3626
4.5093 5.0034 37000 4.3583
4.4211 5.0710 37500 4.3590
4.424 5.1386 38000 4.3583
4.4287 5.2062 38500 4.3555
4.4338 5.2738 39000 4.3510
4.4322 5.3415 39500 4.3491
4.4351 5.4091 40000 4.3448
4.4396 5.4767 40500 4.3419
4.4378 5.5443 41000 4.3379
4.4311 5.6119 41500 4.3344
4.4389 5.6795 42000 4.3307
4.436 5.7472 42500 4.3249
4.436 5.8148 43000 4.3222
4.4421 5.8824 43500 4.3183
4.4344 5.9500 44000 4.3166
4.4157 6.0176 44500 4.3179
4.3316 6.0852 45000 4.3199
4.3371 6.1528 45500 4.3172
4.3419 6.2204 46000 4.3161
4.3457 6.2880 46500 4.3153
4.3476 6.3557 47000 4.3114
4.3527 6.4233 47500 4.3085
4.3541 6.4909 48000 4.3049
4.3394 6.5585 48500 4.3029
4.3612 6.6261 49000 4.2986
4.3557 6.6937 49500 4.2969
4.3522 6.7614 50000 4.2946
4.359 6.8290 50500 4.2904
4.3576 6.8966 51000 4.2877
4.3567 6.9642 51500 4.2849
4.2488 7.0318 52000 4.2899
4.268 7.0994 52500 4.2919
4.2704 7.1670 53000 4.2901
4.2749 7.2346 53500 4.2886
4.2732 7.3022 54000 4.2873
4.2811 7.3699 54500 4.2841
4.286 7.4375 55000 4.2824
4.2827 7.5051 55500 4.2801
4.2873 7.5727 56000 4.2789
4.2855 7.6403 56500 4.2766
4.284 7.7079 57000 4.2738
4.2803 7.7756 57500 4.2710
4.2853 7.8432 58000 4.2699
4.2821 7.9108 58500 4.2666
4.2852 7.9784 59000 4.2652
4.2 8.0460 59500 4.2704
4.2042 8.1136 60000 4.2724
4.207 8.1812 60500 4.2705
4.2166 8.2488 61000 4.2695
4.2104 8.3164 61500 4.2689
4.2195 8.3841 62000 4.2679
4.2242 8.4517 62500 4.2663
4.2157 8.5193 63000 4.2637
4.2197 8.5869 63500 4.2624
4.2222 8.6545 64000 4.2612
4.2236 8.7221 64500 4.2591
4.2196 8.7897 65000 4.2575
4.2177 8.8574 65500 4.2561
4.2216 8.9250 66000 4.2541
4.2241 8.9926 66500 4.2524
4.1643 9.0602 67000 4.2578
4.1584 9.1278 67500 4.2577
4.1627 9.1954 68000 4.2572
4.1618 9.2630 68500 4.2566
4.1674 9.3306 69000 4.2558
4.1681 9.3983 69500 4.2548
4.1668 9.4659 70000 4.2537
4.1703 9.5335 70500 4.2535
4.1643 9.6011 71000 4.2523
4.1639 9.6687 71500 4.2513
4.1687 9.7363 72000 4.2509
4.1641 9.8039 72500 4.2499
4.1648 9.8716 73000 4.2496
4.1634 9.9392 73500 4.2493

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu118
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
13
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results