You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_swe_100m_mult_reseg_lr_div2

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 5.1980

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1331
  • training_steps: 13311
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.9334 0.7510 500 8.6299
7.6049 1.5017 1000 7.3508
6.8311 2.2523 1500 6.5959
6.2779 3.0030 2000 6.2161
6.0348 3.7540 2500 5.9760
5.7717 4.5047 3000 5.8110
5.646 5.2554 3500 5.6896
5.5173 6.0060 4000 5.6005
5.4179 6.7570 4500 5.5314
5.3151 7.5077 5000 5.4852
5.2745 8.2584 5500 5.4518
5.2335 9.0090 6000 5.4272
5.1822 9.7600 6500 5.4152
5.1971 10.5107 7000 5.4112
5.1597 11.2614 7500 5.3670
5.0918 12.0120 8000 5.3265
5.0162 12.7630 8500 5.2962
4.9366 13.5137 9000 5.2719
4.9049 14.2644 9500 5.2539
4.8697 15.0150 10000 5.2348
4.8049 15.7661 10500 5.2208
4.7515 16.5167 11000 5.2151
4.7336 17.2674 11500 5.2097
4.7154 18.0180 12000 5.2038
4.6696 18.7691 12500 5.2007
4.65 19.5197 13000 5.1996

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support