You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_swe_100m_mult_reseg_ba8_lr_div2

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 5324
  • training_steps: 53247
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
10.3108 0.1878 500 9.6576
8.876 0.3756 1000 8.8046
8.671 0.5634 1500 8.6356
8.3155 0.7512 2000 8.2424
8.0584 0.9390 2500 7.9400
7.6587 1.1266 3000 7.5757
7.3445 1.3144 3500 7.2038
6.9477 1.5022 4000 6.8792
6.7221 1.6900 4500 6.6014
6.4333 1.8777 5000 6.3675
6.2379 2.0654 5500 6.1714
6.0225 2.2531 6000 6.0099
5.9099 2.4409 6500 5.8736
5.7626 2.6287 7000 5.7540
5.6867 2.8165 7500 5.6580
5.5791 3.0041 8000 5.5782
5.4179 3.1919 8500 5.5063
5.3663 3.3797 9000 5.4465
5.3326 3.5675 9500 5.3918
5.2693 3.7553 10000 5.3413
5.247 3.9431 10500 5.2974
5.0571 4.1307 11000 5.2597
5.0367 4.3185 11500 5.2283
5.0219 4.5063 12000 5.1997
5.0011 4.6941 12500 5.1673
4.9622 4.8819 13000 5.1382
4.8772 5.0695 13500 5.1211
4.7658 5.2573 14000 5.1076
4.7722 5.4451 14500 5.0866
4.7809 5.6329 15000 5.0653
4.7681 5.8207 15500 5.0478
4.7258 6.0083 16000 5.0400
4.5571 6.1961 16500 5.0402
4.5611 6.3838 17000 5.0295
4.5731 6.5716 17500 5.0165
4.5628 6.7594 18000 5.0040
4.5874 6.9472 18500 4.9877
4.3377 7.1348 19000 5.0118
4.3672 7.3226 19500 5.0082
4.3744 7.5104 20000 4.9995
4.4003 7.6982 20500 4.9910
4.3809 7.8860 21000 4.9810
4.2687 8.0736 21500 5.0073
4.1688 8.2614 22000 5.0130
4.176 8.4492 22500 5.0138
4.2063 8.6370 23000 5.0081
4.2288 8.8248 23500 4.9968
4.1866 9.0124 24000 5.0193
3.9729 9.2002 24500 5.0468
4.02 9.3880 25000 5.0476
4.0319 9.5758 25500 5.0429
4.0501 9.7636 26000 5.0403
4.059 9.9514 26500 5.0346
3.8031 10.1390 27000 5.0843
3.8185 10.3268 27500 5.0923
3.8652 10.5146 28000 5.0940
3.8827 10.7023 28500 5.0947
3.8972 10.8901 29000 5.0906
3.7602 11.0777 29500 5.1355
3.667 11.2655 30000 5.1487
3.714 11.4533 30500 5.1524
3.7273 11.6411 31000 5.1542
3.7351 11.8289 31500 5.1563
3.694 12.0165 32000 5.1883
3.5017 12.2043 32500 5.2126
3.5424 12.3921 33000 5.2197
3.5831 12.5799 33500 5.2283
3.5965 12.7677 34000 5.2333
3.5926 12.9555 34500 5.2274
3.3441 13.1431 35000 5.2869
3.3956 13.3309 35500 5.2952
3.427 13.5187 36000 5.3014
3.4498 13.7065 36500 5.3011
3.4713 13.8943 37000 5.3030
3.3222 14.0819 37500 5.3476
3.2462 14.2697 38000 5.3662
3.2717 14.4575 38500 5.3752
3.3003 14.6453 39000 5.3790
3.3137 14.8331 39500 5.3820
3.2762 15.0207 40000 5.4096
3.1215 15.2085 40500 5.4361
3.1593 15.3962 41000 5.4451
3.1839 15.5840 41500 5.4506
3.2038 15.7718 42000 5.4512
3.2034 15.9596 42500 5.4499
3.0387 16.1472 43000 5.4969
3.0566 16.3350 43500 5.5084
3.0704 16.5228 44000 5.5107
3.0794 16.7106 44500 5.5171
3.1037 16.8984 45000 5.5212
2.972 17.0860 45500 5.5507
2.9404 17.2738 46000 5.5619
2.9774 17.4616 46500 5.5654
2.9855 17.6494 47000 5.5731
2.9956 17.8372 47500 5.5743
2.976 18.0248 48000 5.5943
2.8822 18.2126 48500 5.6037
2.8909 18.4004 49000 5.6109
2.8984 18.5882 49500 5.6126
2.9039 18.7760 50000 5.6159
2.9204 18.9638 50500 5.6169
2.8238 19.1514 51000 5.6313
2.8244 19.3392 51500 5.6359
2.8394 19.5269 52000 5.6363
2.8288 19.7147 52500 5.6384
2.8497 19.9025 53000 5.6389

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results