You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_swe_100m_mult_reseg_ep20_gemma

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6446

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1331
  • training_steps: 13311
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.8597 0.7510 500 8.6242
7.4611 1.5017 1000 7.1015
6.3916 2.2523 1500 6.0770
5.6818 3.0030 2000 5.6144
5.3722 3.7540 2500 5.3655
5.0433 4.5047 3000 5.2152
4.9022 5.2554 3500 5.1209
4.7554 6.0060 4000 5.0575
4.5531 6.7570 4500 5.0187
4.3492 7.5077 5000 5.0189
4.2835 8.2584 5500 5.0368
4.2032 9.0090 6000 5.0475
4.0044 9.7600 6500 5.0640
3.8276 10.5107 7000 5.1151
3.7918 11.2614 7500 5.1775
3.7224 12.0120 8000 5.2051
3.5537 12.7630 8500 5.2526
3.4031 13.5137 9000 5.3180
3.3722 14.2644 9500 5.3804
3.3267 15.0150 10000 5.4193
3.1827 15.7661 10500 5.4642
3.07 16.5167 11000 5.5229
3.0515 17.2674 11500 5.5717
3.011 18.0180 12000 5.5973
2.9155 18.7691 12500 5.6225
2.8593 19.5197 13000 5.6441

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support