exceptions_exp2_swap_0.3_last_to_push_2128

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5821
Accuracy: 0.3658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 2128
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8588	0.2915	1000	4.7821	0.2504
4.3524	0.5830	2000	4.2914	0.2982
4.1537	0.8745	3000	4.1033	0.3146
4.0042	1.1659	4000	3.9983	0.3241
3.941	1.4574	5000	3.9233	0.3308
3.8859	1.7488	6000	3.8658	0.3356
3.7592	2.0402	7000	3.8221	0.3402
3.7489	2.3317	8000	3.7929	0.3433
3.7413	2.6232	9000	3.7607	0.3458
3.7283	2.9147	10000	3.7353	0.3485
3.6364	3.2061	11000	3.7234	0.3504
3.6527	3.4976	12000	3.7046	0.3520
3.6576	3.7891	13000	3.6862	0.3538
3.5507	4.0805	14000	3.6797	0.3548
3.578	4.3719	15000	3.6678	0.3560
3.5787	4.6634	16000	3.6541	0.3571
3.5864	4.9549	17000	3.6399	0.3586
3.5129	5.2463	18000	3.6461	0.3591
3.537	5.5378	19000	3.6333	0.3598
3.541	5.8293	20000	3.6206	0.3608
3.4375	6.1207	21000	3.6236	0.3611
3.4858	6.4122	22000	3.6153	0.3621
3.5062	6.7037	23000	3.6064	0.3626
3.503	6.9952	24000	3.5975	0.3632
3.4373	7.2865	25000	3.6069	0.3635
3.45	7.5780	26000	3.6000	0.3642
3.4773	7.8695	27000	3.5882	0.3648
3.3902	8.1609	28000	3.5965	0.3648
3.4059	8.4524	29000	3.5917	0.3653
3.4442	8.7439	30000	3.5821	0.3658
3.3265	9.0353	31000	3.5846	0.3661
3.3821	9.3268	32000	3.5832	0.3664
3.4159	9.6183	33000	3.5777	0.3667
3.4213	9.9098	34000	3.5697	0.3674
3.3492	10.2011	35000	3.5847	0.3673
3.3768	10.4926	36000	3.5725	0.3678
3.3895	10.7841	37000	3.5632	0.3683
3.2961	11.0755	38000	3.5779	0.3676
3.3395	11.3670	39000	3.5743	0.3679
3.367	11.6585	40000	3.5637	0.3687
3.3932	11.9500	41000	3.5578	0.3692
3.3017	12.2414	42000	3.5702	0.3687
3.3403	12.5329	43000	3.5649	0.3691
3.3552	12.8243	44000	3.5531	0.3696
3.2792	13.1157	45000	3.5676	0.3692
3.3086	13.4072	46000	3.5632	0.3695
3.328	13.6987	47000	3.5539	0.3701
3.3539	13.9902	48000	3.5479	0.3708
3.2844	14.2816	49000	3.5611	0.3698
3.3172	14.5731	50000	3.5567	0.3705
3.3234	14.8646	51000	3.5469	0.3709
3.2409	15.1559	52000	3.5648	0.3704
3.2961	15.4474	53000	3.5565	0.3703
3.3103	15.7389	54000	3.5495	0.3713
3.2078	16.0303	55000	3.5581	0.3708
3.2671	16.3218	56000	3.5591	0.3706
3.2937	16.6133	57000	3.5486	0.3713
3.3089	16.9048	58000	3.5435	0.3717
3.2384	17.1962	59000	3.5570	0.3713
3.2773	17.4877	60000	3.5538	0.3713
3.2846	17.7792	61000	3.5431	0.3721
3.2045	18.0705	62000	3.5580	0.3715
3.2353	18.3620	63000	3.5550	0.3716
3.268	18.6535	64000	3.5452	0.3720
3.2766	18.9450	65000	3.5379	0.3725
3.2226	19.2364	66000	3.5558	0.3717
3.2391	19.5279	67000	3.5464	0.3723
3.2614	19.8194	68000	3.5382	0.3727
3.1841	20.1108	69000	3.5576	0.3719
3.2365	20.4023	70000	3.5541	0.3721
3.2479	20.6938	71000	3.5418	0.3728
3.2626	20.9853	72000	3.5342	0.3733
3.1935	21.2766	73000	3.5542	0.3723
3.2399	21.5681	74000	3.5452	0.3727
3.2333	21.8596	75000	3.5373	0.3733
3.1836	22.1510	76000	3.5537	0.3724
3.2066	22.4425	77000	3.5468	0.3728
3.2369	22.7340	78000	3.5392	0.3733
3.1468	23.0254	79000	3.5541	0.3727
3.1908	23.3169	80000	3.5542	0.3724
3.2033	23.6083	81000	3.5444	0.3730
3.2303	23.8998	82000	3.5344	0.3738
3.163	24.1912	83000	3.5533	0.3728
3.1928	24.4827	84000	3.5471	0.3732
3.21	24.7742	85000	3.5383	0.3736
3.1293	25.0656	86000	3.5520	0.3731
3.1791	25.3571	87000	3.5519	0.3731
3.1959	25.6486	88000	3.5453	0.3735
3.2181	25.9401	89000	3.5350	0.3743
3.1563	26.2314	90000	3.5543	0.3731
3.1801	26.5229	91000	3.5477	0.3736
3.2025	26.8144	92000	3.5389	0.3741

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32