You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_swe_100m_mult_retok

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 5.0811

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 665
  • training_steps: 6655
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.7051 0.7510 500 8.1513
6.7413 1.5017 1000 6.5119
6.0516 2.2523 1500 5.8710
5.5523 3.0030 2000 5.5322
5.2899 3.7540 2500 5.3381
5.0084 4.5047 3000 5.2254
4.8915 5.2554 3500 5.1517
4.7619 6.0060 4000 5.1070
4.6009 6.7570 4500 5.0784
4.4513 7.5077 5000 5.0740
4.4063 8.2584 5500 5.0808
4.3455 9.0090 6000 5.0806
4.249 9.7600 6500 5.0824

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support