exceptions_exp2_swap_0.3_last_to_hit_5039

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5667
Accuracy: 0.3689

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 5039
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8411	0.2915	1000	4.7605	0.2537
4.3485	0.5830	2000	4.2930	0.2982
4.1544	0.8745	3000	4.1129	0.3138
4.0019	1.1659	4000	3.9994	0.3237
3.9387	1.4574	5000	3.9249	0.3304
3.883	1.7488	6000	3.8638	0.3359
3.7477	2.0402	7000	3.8237	0.3401
3.766	2.3317	8000	3.7911	0.3432
3.7441	2.6232	9000	3.7624	0.3457
3.7301	2.9147	10000	3.7354	0.3484
3.6353	3.2061	11000	3.7236	0.3504
3.6539	3.4976	12000	3.7038	0.3522
3.6494	3.7891	13000	3.6862	0.3535
3.5518	4.0805	14000	3.6803	0.3548
3.5734	4.3719	15000	3.6683	0.3562
3.5844	4.6634	16000	3.6556	0.3572
3.585	4.9549	17000	3.6423	0.3583
3.5148	5.2463	18000	3.6450	0.3591
3.5198	5.5378	19000	3.6345	0.3599
3.5489	5.8293	20000	3.6225	0.3606
3.4586	6.1207	21000	3.6258	0.3613
3.4951	6.4122	22000	3.6199	0.3619
3.5015	6.7037	23000	3.6116	0.3625
3.4954	6.9952	24000	3.5990	0.3634
3.4427	7.2865	25000	3.6094	0.3631
3.4551	7.5780	26000	3.5977	0.3641
3.4653	7.8695	27000	3.5883	0.3651
3.3994	8.1609	28000	3.5992	0.3646
3.4363	8.4524	29000	3.5957	0.3649
3.4403	8.7439	30000	3.5842	0.3658
3.3423	9.0353	31000	3.5899	0.3657
3.3944	9.3268	32000	3.5885	0.3660
3.4041	9.6183	33000	3.5800	0.3668
3.4225	9.9098	34000	3.5704	0.3674
3.3502	10.2011	35000	3.5833	0.3669
3.3802	10.4926	36000	3.5733	0.3673
3.4018	10.7841	37000	3.5692	0.3679
3.3104	11.0755	38000	3.5794	0.3678
3.3479	11.3670	39000	3.5733	0.3681
3.3639	11.6585	40000	3.5667	0.3689
3.3892	11.9500	41000	3.5601	0.3690
3.3242	12.2414	42000	3.5760	0.3684
3.348	12.5329	43000	3.5661	0.3690
3.3595	12.8243	44000	3.5569	0.3696
3.2858	13.1157	45000	3.5700	0.3691
3.3249	13.4072	46000	3.5646	0.3696
3.3351	13.6987	47000	3.5552	0.3699
3.3514	13.9902	48000	3.5497	0.3704
3.2807	14.2816	49000	3.5690	0.3696
3.3104	14.5731	50000	3.5584	0.3701
3.3457	14.8646	51000	3.5513	0.3703
3.2624	15.1559	52000	3.5662	0.3703
3.2797	15.4474	53000	3.5608	0.3705
3.3148	15.7389	54000	3.5486	0.3712
3.2037	16.0303	55000	3.5595	0.3705
3.2682	16.3218	56000	3.5577	0.3707
3.2804	16.6133	57000	3.5518	0.3711
3.3058	16.9048	58000	3.5435	0.3716
3.2326	17.1962	59000	3.5608	0.3708
3.2678	17.4877	60000	3.5520	0.3714
3.2766	17.7792	61000	3.5457	0.3721
3.1912	18.0705	62000	3.5604	0.3712
3.24	18.3620	63000	3.5570	0.3714
3.2614	18.6535	64000	3.5475	0.3720
3.2817	18.9450	65000	3.5375	0.3725
3.2077	19.2364	66000	3.5576	0.3719
3.2443	19.5279	67000	3.5522	0.3719
3.2538	19.8194	68000	3.5432	0.3725
3.1984	20.1108	69000	3.5583	0.3717
3.2268	20.4023	70000	3.5540	0.3719
3.2424	20.6938	71000	3.5512	0.3722
3.2674	20.9853	72000	3.5389	0.3731
3.2186	21.2766	73000	3.5576	0.3718
3.2176	21.5681	74000	3.5500	0.3724
3.2438	21.8596	75000	3.5432	0.3729
3.1712	22.1510	76000	3.5604	0.3722
3.213	22.4425	77000	3.5536	0.3727
3.234	22.7340	78000	3.5455	0.3731
3.144	23.0254	79000	3.5583	0.3724
3.1866	23.3169	80000	3.5562	0.3727
3.2165	23.6083	81000	3.5472	0.3729
3.2333	23.8998	82000	3.5398	0.3737
3.1759	24.1912	83000	3.5600	0.3725
3.1844	24.4827	84000	3.5542	0.3729
3.2155	24.7742	85000	3.5427	0.3734

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32