exceptions_exp2_swap_0.3_resemble_to_push_5039

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5796
Accuracy: 0.3661

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 5039
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8407	0.2915	1000	4.7604	0.2539
4.3428	0.5830	2000	4.2849	0.2994
4.1582	0.8745	3000	4.1005	0.3152
3.9948	1.1659	4000	3.9920	0.3247
3.9375	1.4574	5000	3.9171	0.3315
3.8864	1.7488	6000	3.8588	0.3366
3.7541	2.0402	7000	3.8152	0.3410
3.7604	2.3317	8000	3.7851	0.3438
3.7541	2.6232	9000	3.7561	0.3467
3.7263	2.9147	10000	3.7297	0.3492
3.6438	3.2061	11000	3.7211	0.3507
3.6444	3.4976	12000	3.7008	0.3524
3.6453	3.7891	13000	3.6806	0.3543
3.5308	4.0805	14000	3.6751	0.3553
3.5773	4.3719	15000	3.6636	0.3565
3.5793	4.6634	16000	3.6500	0.3576
3.5809	4.9549	17000	3.6363	0.3588
3.5032	5.2463	18000	3.6420	0.3591
3.5236	5.5378	19000	3.6300	0.3602
3.5232	5.8293	20000	3.6181	0.3614
3.4527	6.1207	21000	3.6232	0.3616
3.4753	6.4122	22000	3.6143	0.3621
3.4868	6.7037	23000	3.6033	0.3629
3.4969	6.9952	24000	3.5945	0.3641
3.4301	7.2865	25000	3.6059	0.3638
3.4554	7.5780	26000	3.5934	0.3644
3.4578	7.8695	27000	3.5854	0.3650
3.3815	8.1609	28000	3.5924	0.3653
3.4138	8.4524	29000	3.5863	0.3658
3.4356	8.7439	30000	3.5796	0.3661
3.3197	9.0353	31000	3.5844	0.3662
3.3752	9.3268	32000	3.5846	0.3666
3.3888	9.6183	33000	3.5767	0.3672
3.4087	9.9098	34000	3.5647	0.3677
3.3348	10.2011	35000	3.5785	0.3673
3.3819	10.4926	36000	3.5721	0.3677
3.3905	10.7841	37000	3.5625	0.3684
3.3048	11.0755	38000	3.5734	0.3680
3.3544	11.3670	39000	3.5720	0.3682
3.3746	11.6585	40000	3.5638	0.3689
3.3725	11.9500	41000	3.5545	0.3695
3.3097	12.2414	42000	3.5686	0.3691
3.3247	12.5329	43000	3.5609	0.3693
3.3603	12.8243	44000	3.5532	0.3697
3.2745	13.1157	45000	3.5669	0.3693
3.3209	13.4072	46000	3.5593	0.3694
3.3377	13.6987	47000	3.5553	0.3701
3.339	13.9902	48000	3.5435	0.3706
3.2842	14.2816	49000	3.5628	0.3699
3.3205	14.5731	50000	3.5538	0.3705
3.3182	14.8646	51000	3.5455	0.3711
3.2521	15.1559	52000	3.5626	0.3703
3.2937	15.4474	53000	3.5526	0.3708
3.3077	15.7389	54000	3.5481	0.3710
3.1995	16.0303	55000	3.5599	0.3708
3.2493	16.3218	56000	3.5563	0.3709
3.2898	16.6133	57000	3.5495	0.3713
3.2945	16.9048	58000	3.5426	0.3721
3.2335	17.1962	59000	3.5598	0.3710
3.2572	17.4877	60000	3.5550	0.3713
3.278	17.7792	61000	3.5456	0.3718
3.1972	18.0705	62000	3.5601	0.3712
3.2396	18.3620	63000	3.5561	0.3714
3.2669	18.6535	64000	3.5449	0.3721
3.2735	18.9450	65000	3.5384	0.3724
3.2194	19.2364	66000	3.5523	0.3720
3.246	19.5279	67000	3.5501	0.3719
3.261	19.8194	68000	3.5412	0.3724
3.1834	20.1108	69000	3.5578	0.3716
3.2114	20.4023	70000	3.5524	0.3720
3.2371	20.6938	71000	3.5429	0.3722
3.2579	20.9853	72000	3.5401	0.3729
3.2088	21.2766	73000	3.5578	0.3720
3.2299	21.5681	74000	3.5465	0.3725
3.2417	21.8596	75000	3.5391	0.3731
3.1715	22.1510	76000	3.5565	0.3724
3.2053	22.4425	77000	3.5499	0.3726
3.2265	22.7340	78000	3.5446	0.3728
3.1477	23.0254	79000	3.5603	0.3723
3.1758	23.3169	80000	3.5557	0.3727
3.2162	23.6083	81000	3.5456	0.3729
3.2257	23.8998	82000	3.5428	0.3733
3.1685	24.1912	83000	3.5614	0.3725
3.2014	24.4827	84000	3.5499	0.3728
3.2056	24.7742	85000	3.5430	0.3735

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32