dense_swe_100m_mult_reseg_ba8_ep20
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 6.7378
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 5324
- training_steps: 53247
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.9015 | 0.1878 | 500 | 9.1175 |
| 8.689 | 0.3756 | 1000 | 8.6642 |
| 8.4411 | 0.5634 | 1500 | 8.2257 |
| 7.8529 | 0.7512 | 2000 | 7.7371 |
| 7.3951 | 0.9390 | 2500 | 7.1652 |
| 6.7961 | 1.1266 | 3000 | 6.7200 |
| 6.511 | 1.3144 | 3500 | 6.3980 |
| 6.1935 | 1.5022 | 4000 | 6.1379 |
| 6.0226 | 1.6900 | 4500 | 5.9297 |
| 5.8056 | 1.8777 | 5000 | 5.7542 |
| 5.6273 | 2.0654 | 5500 | 5.6068 |
| 5.4522 | 2.2531 | 6000 | 5.4897 |
| 5.3698 | 2.4409 | 6500 | 5.3893 |
| 5.2664 | 2.6287 | 7000 | 5.3017 |
| 5.2165 | 2.8165 | 7500 | 5.2307 |
| 5.1348 | 3.0041 | 8000 | 5.1766 |
| 4.9022 | 3.1919 | 8500 | 5.1287 |
| 4.8949 | 3.3797 | 9000 | 5.0852 |
| 4.8797 | 3.5675 | 9500 | 5.0489 |
| 4.8429 | 3.7553 | 10000 | 5.0115 |
| 4.8384 | 3.9431 | 10500 | 4.9803 |
| 4.5472 | 4.1307 | 11000 | 4.9713 |
| 4.5482 | 4.3185 | 11500 | 4.9541 |
| 4.5674 | 4.5063 | 12000 | 4.9355 |
| 4.5669 | 4.6941 | 12500 | 4.9120 |
| 4.5547 | 4.8819 | 13000 | 4.8936 |
| 4.4089 | 5.0695 | 13500 | 4.9055 |
| 4.2435 | 5.2573 | 14000 | 4.9101 |
| 4.275 | 5.4451 | 14500 | 4.9016 |
| 4.3108 | 5.6329 | 15000 | 4.8855 |
| 4.3139 | 5.8207 | 15500 | 4.8740 |
| 4.2741 | 6.0083 | 16000 | 4.8914 |
| 3.9597 | 6.1961 | 16500 | 4.9230 |
| 4.0031 | 6.3838 | 17000 | 4.9186 |
| 4.037 | 6.5716 | 17500 | 4.9167 |
| 4.0509 | 6.7594 | 18000 | 4.9096 |
| 4.0851 | 6.9472 | 18500 | 4.8967 |
| 3.6601 | 7.1348 | 19000 | 4.9751 |
| 3.7189 | 7.3226 | 19500 | 4.9886 |
| 3.7693 | 7.5104 | 20000 | 4.9891 |
| 3.8124 | 7.6982 | 20500 | 4.9833 |
| 3.8182 | 7.8860 | 21000 | 4.9778 |
| 3.6147 | 8.0736 | 21500 | 5.0529 |
| 3.4415 | 8.2614 | 22000 | 5.0853 |
| 3.482 | 8.4492 | 22500 | 5.0975 |
| 3.5447 | 8.6370 | 23000 | 5.0973 |
| 3.5833 | 8.8248 | 23500 | 5.0977 |
| 3.5299 | 9.0124 | 24000 | 5.1448 |
| 3.1458 | 9.2002 | 24500 | 5.2084 |
| 3.2443 | 9.3880 | 25000 | 5.2302 |
| 3.2832 | 9.5758 | 25500 | 5.2377 |
| 3.3276 | 9.7636 | 26000 | 5.2412 |
| 3.3593 | 9.9514 | 26500 | 5.2287 |
| 2.8907 | 10.1390 | 27000 | 5.3481 |
| 2.9459 | 10.3268 | 27500 | 5.3877 |
| 3.0302 | 10.5146 | 28000 | 5.3893 |
| 3.076 | 10.7023 | 28500 | 5.3970 |
| 3.1203 | 10.8901 | 29000 | 5.3907 |
| 2.8622 | 11.0777 | 29500 | 5.5043 |
| 2.7156 | 11.2655 | 30000 | 5.5422 |
| 2.7877 | 11.4533 | 30500 | 5.5638 |
| 2.8444 | 11.6411 | 31000 | 5.5683 |
| 2.8711 | 11.8289 | 31500 | 5.5713 |
| 2.8076 | 12.0165 | 32000 | 5.6334 |
| 2.4561 | 12.2043 | 32500 | 5.7110 |
| 2.5524 | 12.3921 | 33000 | 5.7359 |
| 2.6079 | 12.5799 | 33500 | 5.7447 |
| 2.6534 | 12.7677 | 34000 | 5.7560 |
| 2.6703 | 12.9555 | 34500 | 5.7619 |
| 2.2325 | 13.1431 | 35000 | 5.8732 |
| 2.3188 | 13.3309 | 35500 | 5.9043 |
| 2.3853 | 13.5187 | 36000 | 5.9254 |
| 2.4307 | 13.7065 | 36500 | 5.9430 |
| 2.4778 | 13.8943 | 37000 | 5.9402 |
| 2.2173 | 14.0819 | 37500 | 6.0347 |
| 2.1066 | 14.2697 | 38000 | 6.0776 |
| 2.1563 | 14.4575 | 38500 | 6.1014 |
| 2.2142 | 14.6453 | 39000 | 6.1070 |
| 2.2424 | 14.8331 | 39500 | 6.1274 |
| 2.1699 | 15.0207 | 40000 | 6.1792 |
| 1.902 | 15.2085 | 40500 | 6.2408 |
| 1.9817 | 15.3962 | 41000 | 6.2595 |
| 2.0208 | 15.5840 | 41500 | 6.2846 |
| 2.0606 | 15.7718 | 42000 | 6.2955 |
| 2.0805 | 15.9596 | 42500 | 6.3052 |
| 1.7666 | 16.1472 | 43000 | 6.3871 |
| 1.8034 | 16.3350 | 43500 | 6.4139 |
| 1.8489 | 16.5228 | 44000 | 6.4350 |
| 1.8718 | 16.7106 | 44500 | 6.4465 |
| 1.906 | 16.8984 | 45000 | 6.4588 |
| 1.7083 | 17.0860 | 45500 | 6.5191 |
| 1.6495 | 17.2738 | 46000 | 6.5477 |
| 1.6909 | 17.4616 | 46500 | 6.5635 |
| 1.722 | 17.6494 | 47000 | 6.5767 |
| 1.7361 | 17.8372 | 47500 | 6.5894 |
| 1.6829 | 18.0248 | 48000 | 6.6235 |
| 1.5369 | 18.2126 | 48500 | 6.6490 |
| 1.5603 | 18.4004 | 49000 | 6.6642 |
| 1.5763 | 18.5882 | 49500 | 6.6764 |
| 1.5949 | 18.7760 | 50000 | 6.6845 |
| 1.6067 | 18.9638 | 50500 | 6.6907 |
| 1.4507 | 19.1514 | 51000 | 6.7172 |
| 1.4578 | 19.3392 | 51500 | 6.7298 |
| 1.4774 | 19.5269 | 52000 | 6.7333 |
| 1.4718 | 19.7147 | 52500 | 6.7363 |
| 1.4904 | 19.9025 | 53000 | 6.7375 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.0+cu128
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- -