exceptions_exp2_swap_0.3_last_to_push_3591

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5612
Accuracy: 0.3691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 3591
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8349	0.2915	1000	4.7556	0.2544
4.3423	0.5830	2000	4.2891	0.2986
4.1453	0.8745	3000	4.1022	0.3142
4.0008	1.1659	4000	3.9905	0.3247
3.9355	1.4574	5000	3.9172	0.3314
3.893	1.7488	6000	3.8603	0.3364
3.7468	2.0402	7000	3.8165	0.3402
3.7689	2.3317	8000	3.7873	0.3438
3.7326	2.6232	9000	3.7566	0.3464
3.7187	2.9147	10000	3.7328	0.3485
3.647	3.2061	11000	3.7167	0.3512
3.6572	3.4976	12000	3.6983	0.3522
3.6396	3.7891	13000	3.6799	0.3543
3.5552	4.0805	14000	3.6737	0.3554
3.5741	4.3719	15000	3.6636	0.3564
3.5836	4.6634	16000	3.6511	0.3578
3.5849	4.9549	17000	3.6394	0.3589
3.5067	5.2463	18000	3.6389	0.3596
3.5198	5.5378	19000	3.6279	0.3603
3.542	5.8293	20000	3.6177	0.3613
3.4358	6.1207	21000	3.6214	0.3613
3.4732	6.4122	22000	3.6151	0.3621
3.4959	6.7037	23000	3.6028	0.3629
3.4992	6.9952	24000	3.5965	0.3637
3.43	7.2865	25000	3.6025	0.3638
3.4542	7.5780	26000	3.5949	0.3644
3.4592	7.8695	27000	3.5857	0.3651
3.3761	8.1609	28000	3.5956	0.3650
3.4118	8.4524	29000	3.5885	0.3652
3.4251	8.7439	30000	3.5792	0.3662
3.3261	9.0353	31000	3.5858	0.3661
3.3842	9.3268	32000	3.5834	0.3665
3.4044	9.6183	33000	3.5759	0.3668
3.4281	9.9098	34000	3.5656	0.3675
3.3328	10.2011	35000	3.5799	0.3672
3.3691	10.4926	36000	3.5708	0.3677
3.3935	10.7841	37000	3.5630	0.3685
3.2855	11.0755	38000	3.5747	0.3682
3.3386	11.3670	39000	3.5682	0.3683
3.3678	11.6585	40000	3.5612	0.3691
3.3761	11.9500	41000	3.5578	0.3690
3.3202	12.2414	42000	3.5683	0.3687
3.3355	12.5329	43000	3.5576	0.3694
3.3555	12.8243	44000	3.5534	0.3699
3.2868	13.1157	45000	3.5688	0.3691
3.3047	13.4072	46000	3.5593	0.3694
3.3336	13.6987	47000	3.5533	0.3702
3.3468	13.9902	48000	3.5451	0.3705
3.2932	14.2816	49000	3.5651	0.3698
3.3082	14.5731	50000	3.5567	0.3703
3.3245	14.8646	51000	3.5471	0.3708
3.2461	15.1559	52000	3.5623	0.3702
3.2811	15.4474	53000	3.5571	0.3706
3.2966	15.7389	54000	3.5477	0.3712
3.2185	16.0303	55000	3.5608	0.3704
3.2578	16.3218	56000	3.5600	0.3708
3.2815	16.6133	57000	3.5508	0.3712
3.3002	16.9048	58000	3.5423	0.3718
3.2305	17.1962	59000	3.5623	0.3709
3.2584	17.4877	60000	3.5526	0.3712
3.2785	17.7792	61000	3.5463	0.3715
3.2009	18.0705	62000	3.5599	0.3712
3.2388	18.3620	63000	3.5529	0.3716
3.2704	18.6535	64000	3.5493	0.3717
3.2754	18.9450	65000	3.5376	0.3726
3.2164	19.2364	66000	3.5528	0.3716
3.2509	19.5279	67000	3.5480	0.3719
3.2664	19.8194	68000	3.5414	0.3724
3.1739	20.1108	69000	3.5566	0.3717
3.2244	20.4023	70000	3.5543	0.3721
3.2413	20.6938	71000	3.5454	0.3725
3.2372	20.9853	72000	3.5379	0.3730
3.2002	21.2766	73000	3.5547	0.3717
3.2379	21.5681	74000	3.5499	0.3725
3.2427	21.8596	75000	3.5394	0.3730
3.1711	22.1510	76000	3.5550	0.3724
3.2046	22.4425	77000	3.5524	0.3724
3.2299	22.7340	78000	3.5418	0.3729
3.1306	23.0254	79000	3.5571	0.3722
3.1783	23.3169	80000	3.5575	0.3725
3.2113	23.6083	81000	3.5456	0.3726
3.2303	23.8998	82000	3.5404	0.3733
3.154	24.1912	83000	3.5557	0.3728
3.1822	24.4827	84000	3.5487	0.3733
3.2113	24.7742	85000	3.5431	0.3733

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32