You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

dense_swe_100m_mult_reseg_lr_div8_ep20

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1.25e-05
train_batch_size: 8
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1331
training_steps: 13311
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
10.2002	0.7510	500	9.5759
8.808	1.5017	1000	8.7371
8.6213	2.2523	1500	8.4501
8.1045	3.0030	2000	8.0456
7.8934	3.7540	2500	7.7922
7.6054	4.5047	3000	7.5665
7.438	5.2554	3500	7.3679
7.2274	6.0060	4000	7.2004
7.1059	6.7570	4500	7.0567
6.9481	7.5077	5000	6.9363
6.8554	8.2584	5500	6.8310
6.7432	9.0090	6000	6.7358
6.6628	9.7600	6500	6.6570
6.5742	10.5107	7000	6.5873
6.5268	11.2614	7500	6.5264
6.4527	12.0120	8000	6.4696
6.4082	12.7630	8500	6.4255
6.3547	13.5137	9000	6.3804
6.3225	14.2644	9500	6.3445
6.2913	15.0150	10000	6.3164
6.26	15.7661	10500	6.2921
6.2316	16.5167	11000	6.2687
6.2107	17.2674	11500	6.2513
6.2006	18.0180	12000	6.2365
6.181	18.7691	12500	6.2266
6.1723	19.5197	13000	6.2210

Safetensors

Model size

0.2B params

Tensor type

F32