You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

dense_swe_100m_mult_reseg_ba8_lr_div2

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

Loss: 5.6392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5324
training_steps: 53247
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
10.3108	0.1878	500	9.6576
8.876	0.3756	1000	8.8046
8.671	0.5634	1500	8.6356
8.3155	0.7512	2000	8.2424
8.0584	0.9390	2500	7.9400
7.6587	1.1266	3000	7.5757
7.3445	1.3144	3500	7.2038
6.9477	1.5022	4000	6.8792
6.7221	1.6900	4500	6.6014
6.4333	1.8777	5000	6.3675
6.2379	2.0654	5500	6.1714
6.0225	2.2531	6000	6.0099
5.9099	2.4409	6500	5.8736
5.7626	2.6287	7000	5.7540
5.6867	2.8165	7500	5.6580
5.5791	3.0041	8000	5.5782
5.4179	3.1919	8500	5.5063
5.3663	3.3797	9000	5.4465
5.3326	3.5675	9500	5.3918
5.2693	3.7553	10000	5.3413
5.247	3.9431	10500	5.2974
5.0571	4.1307	11000	5.2597
5.0367	4.3185	11500	5.2283
5.0219	4.5063	12000	5.1997
5.0011	4.6941	12500	5.1673
4.9622	4.8819	13000	5.1382
4.8772	5.0695	13500	5.1211
4.7658	5.2573	14000	5.1076
4.7722	5.4451	14500	5.0866
4.7809	5.6329	15000	5.0653
4.7681	5.8207	15500	5.0478
4.7258	6.0083	16000	5.0400
4.5571	6.1961	16500	5.0402
4.5611	6.3838	17000	5.0295
4.5731	6.5716	17500	5.0165
4.5628	6.7594	18000	5.0040
4.5874	6.9472	18500	4.9877
4.3377	7.1348	19000	5.0118
4.3672	7.3226	19500	5.0082
4.3744	7.5104	20000	4.9995
4.4003	7.6982	20500	4.9910
4.3809	7.8860	21000	4.9810
4.2687	8.0736	21500	5.0073
4.1688	8.2614	22000	5.0130
4.176	8.4492	22500	5.0138
4.2063	8.6370	23000	5.0081
4.2288	8.8248	23500	4.9968
4.1866	9.0124	24000	5.0193
3.9729	9.2002	24500	5.0468
4.02	9.3880	25000	5.0476
4.0319	9.5758	25500	5.0429
4.0501	9.7636	26000	5.0403
4.059	9.9514	26500	5.0346
3.8031	10.1390	27000	5.0843
3.8185	10.3268	27500	5.0923
3.8652	10.5146	28000	5.0940
3.8827	10.7023	28500	5.0947
3.8972	10.8901	29000	5.0906
3.7602	11.0777	29500	5.1355
3.667	11.2655	30000	5.1487
3.714	11.4533	30500	5.1524
3.7273	11.6411	31000	5.1542
3.7351	11.8289	31500	5.1563
3.694	12.0165	32000	5.1883
3.5017	12.2043	32500	5.2126
3.5424	12.3921	33000	5.2197
3.5831	12.5799	33500	5.2283
3.5965	12.7677	34000	5.2333
3.5926	12.9555	34500	5.2274
3.3441	13.1431	35000	5.2869
3.3956	13.3309	35500	5.2952
3.427	13.5187	36000	5.3014
3.4498	13.7065	36500	5.3011
3.4713	13.8943	37000	5.3030
3.3222	14.0819	37500	5.3476
3.2462	14.2697	38000	5.3662
3.2717	14.4575	38500	5.3752
3.3003	14.6453	39000	5.3790
3.3137	14.8331	39500	5.3820
3.2762	15.0207	40000	5.4096
3.1215	15.2085	40500	5.4361
3.1593	15.3962	41000	5.4451
3.1839	15.5840	41500	5.4506
3.2038	15.7718	42000	5.4512
3.2034	15.9596	42500	5.4499
3.0387	16.1472	43000	5.4969
3.0566	16.3350	43500	5.5084
3.0704	16.5228	44000	5.5107
3.0794	16.7106	44500	5.5171
3.1037	16.8984	45000	5.5212
2.972	17.0860	45500	5.5507
2.9404	17.2738	46000	5.5619
2.9774	17.4616	46500	5.5654
2.9855	17.6494	47000	5.5731
2.9956	17.8372	47500	5.5743
2.976	18.0248	48000	5.5943
2.8822	18.2126	48500	5.6037
2.8909	18.4004	49000	5.6109
2.8984	18.5882	49500	5.6126
2.9039	18.7760	50000	5.6159
2.9204	18.9638	50500	5.6169
2.8238	19.1514	51000	5.6313
2.8244	19.3392	51500	5.6359
2.8394	19.5269	52000	5.6363
2.8288	19.7147	52500	5.6384
2.8497	19.9025	53000	5.6389

Framework versions

Transformers 4.57.1
Pytorch 2.9.0+cu128
Datasets 3.6.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32