exceptions_exp2_swap_0.3_resemble_to_hit_1032

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5652
Accuracy: 0.3686

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 1032
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8337	0.2915	1000	4.7596	0.2541
4.3461	0.5830	2000	4.2894	0.2982
4.1574	0.8745	3000	4.1024	0.3146
3.9878	1.1659	4000	4.0020	0.3234
3.9423	1.4574	5000	3.9223	0.3304
3.8968	1.7488	6000	3.8649	0.3357
3.7577	2.0402	7000	3.8210	0.3403
3.7532	2.3317	8000	3.7929	0.3430
3.7539	2.6232	9000	3.7618	0.3457
3.7316	2.9147	10000	3.7364	0.3487
3.6419	3.2061	11000	3.7224	0.3502
3.6553	3.4976	12000	3.7043	0.3521
3.6548	3.7891	13000	3.6873	0.3537
3.5477	4.0805	14000	3.6807	0.3547
3.5719	4.3719	15000	3.6691	0.3561
3.5759	4.6634	16000	3.6561	0.3571
3.5815	4.9549	17000	3.6422	0.3582
3.528	5.2463	18000	3.6437	0.3588
3.5332	5.5378	19000	3.6315	0.3598
3.5395	5.8293	20000	3.6211	0.3608
3.4464	6.1207	21000	3.6264	0.3614
3.478	6.4122	22000	3.6213	0.3617
3.4894	6.7037	23000	3.6094	0.3623
3.4967	6.9952	24000	3.6003	0.3634
3.4394	7.2865	25000	3.6097	0.3631
3.4716	7.5780	26000	3.6013	0.3640
3.4623	7.8695	27000	3.5933	0.3644
3.3819	8.1609	28000	3.5981	0.3644
3.4159	8.4524	29000	3.5919	0.3653
3.4337	8.7439	30000	3.5822	0.3656
3.3372	9.0353	31000	3.5887	0.3658
3.3866	9.3268	32000	3.5879	0.3661
3.402	9.6183	33000	3.5801	0.3663
3.4262	9.9098	34000	3.5700	0.3672
3.3374	10.2011	35000	3.5836	0.3666
3.3779	10.4926	36000	3.5764	0.3674
3.3979	10.7841	37000	3.5686	0.3679
3.2948	11.0755	38000	3.5798	0.3677
3.357	11.3670	39000	3.5796	0.3675
3.3737	11.6585	40000	3.5652	0.3686
3.3738	11.9500	41000	3.5594	0.3689
3.3039	12.2414	42000	3.5740	0.3684
3.3309	12.5329	43000	3.5665	0.3689
3.3656	12.8243	44000	3.5574	0.3695
3.259	13.1157	45000	3.5718	0.3687
3.3135	13.4072	46000	3.5674	0.3691
3.3412	13.6987	47000	3.5597	0.3699
3.3522	13.9902	48000	3.5482	0.3700
3.2958	14.2816	49000	3.5664	0.3696
3.3067	14.5731	50000	3.5592	0.3698
3.3295	14.8646	51000	3.5506	0.3703
3.2761	15.1559	52000	3.5646	0.3698
3.2866	15.4474	53000	3.5615	0.3700
3.3111	15.7389	54000	3.5517	0.3710
3.2108	16.0303	55000	3.5631	0.3704
3.2631	16.3218	56000	3.5603	0.3704
3.2903	16.6133	57000	3.5520	0.3708
3.3082	16.9048	58000	3.5458	0.3715
3.2211	17.1962	59000	3.5635	0.3709
3.2702	17.4877	60000	3.5564	0.3712
3.2788	17.7792	61000	3.5486	0.3715
3.2108	18.0705	62000	3.5620	0.3712
3.2456	18.3620	63000	3.5587	0.3713
3.2667	18.6535	64000	3.5520	0.3716
3.2886	18.9450	65000	3.5419	0.3720
3.2234	19.2364	66000	3.5611	0.3714
3.2478	19.5279	67000	3.5549	0.3718
3.2757	19.8194	68000	3.5445	0.3722
3.2054	20.1108	69000	3.5599	0.3713
3.2244	20.4023	70000	3.5549	0.3718
3.2411	20.6938	71000	3.5489	0.3722
3.2563	20.9853	72000	3.5420	0.3725
3.1956	21.2766	73000	3.5604	0.3718
3.2404	21.5681	74000	3.5521	0.3721
3.243	21.8596	75000	3.5429	0.3728
3.1764	22.1510	76000	3.5602	0.3720
3.2124	22.4425	77000	3.5564	0.3720
3.2347	22.7340	78000	3.5458	0.3726
3.1452	23.0254	79000	3.5557	0.3726
3.1942	23.3169	80000	3.5558	0.3722
3.2183	23.6083	81000	3.5484	0.3730
3.2317	23.8998	82000	3.5432	0.3731
3.1687	24.1912	83000	3.5593	0.3722
3.1888	24.4827	84000	3.5538	0.3727
3.2178	24.7742	85000	3.5479	0.3730

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32