exceptions_exp2_swap_0.3_cost_to_drop_2128

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5645
Accuracy: 0.3686

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 2128
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.8412	0.2916	1000	4.7675	0.2525
4.3629	0.5831	2000	4.2992	0.2974
4.1467	0.8747	3000	4.1125	0.3142
4.0091	1.1662	4000	3.9979	0.3240
3.9325	1.4578	5000	3.9250	0.3302
3.8863	1.7493	6000	3.8661	0.3356
3.7587	2.0408	7000	3.8232	0.3400
3.7568	2.3324	8000	3.7916	0.3431
3.7504	2.6239	9000	3.7629	0.3457
3.7423	2.9155	10000	3.7371	0.3482
3.6398	3.2070	11000	3.7222	0.3501
3.6404	3.4986	12000	3.7053	0.3518
3.6436	3.7901	13000	3.6884	0.3535
3.5458	4.0816	14000	3.6811	0.3546
3.5789	4.3732	15000	3.6694	0.3560
3.5978	4.6648	16000	3.6576	0.3570
3.5815	4.9563	17000	3.6433	0.3581
3.5227	5.2478	18000	3.6438	0.3590
3.5329	5.5394	19000	3.6334	0.3598
3.5403	5.8310	20000	3.6214	0.3607
3.4624	6.1225	21000	3.6298	0.3607
3.4801	6.4140	22000	3.6188	0.3618
3.499	6.7056	23000	3.6087	0.3624
3.5005	6.9971	24000	3.5999	0.3635
3.4326	7.2886	25000	3.6095	0.3631
3.4619	7.5802	26000	3.5995	0.3637
3.4767	7.8718	27000	3.5887	0.3647
3.3875	8.1633	28000	3.5997	0.3644
3.4172	8.4548	29000	3.5920	0.3650
3.442	8.7464	30000	3.5813	0.3657
3.3494	9.0379	31000	3.5922	0.3659
3.3838	9.3295	32000	3.5890	0.3661
3.4142	9.6210	33000	3.5799	0.3665
3.4202	9.9126	34000	3.5699	0.3671
3.3572	10.2041	35000	3.5847	0.3668
3.3677	10.4957	36000	3.5785	0.3669
3.3858	10.7872	37000	3.5684	0.3675
3.3138	11.0787	38000	3.5779	0.3678
3.3435	11.3703	39000	3.5729	0.3681
3.3632	11.6618	40000	3.5645	0.3686
3.3812	11.9534	41000	3.5576	0.3690
3.2993	12.2449	42000	3.5710	0.3682
3.3399	12.5365	43000	3.5659	0.3689
3.3545	12.8280	44000	3.5561	0.3695
3.2804	13.1195	45000	3.5699	0.3687
3.3232	13.4111	46000	3.5670	0.3689
3.3411	13.7027	47000	3.5578	0.3697
3.3427	13.9942	48000	3.5520	0.3701
3.2652	14.2857	49000	3.5674	0.3695
3.3215	14.5773	50000	3.5607	0.3700
3.3388	14.8689	51000	3.5499	0.3706
3.263	15.1604	52000	3.5668	0.3698
3.2958	15.4519	53000	3.5581	0.3703
3.3097	15.7435	54000	3.5523	0.3708
3.2196	16.0350	55000	3.5626	0.3702
3.2662	16.3265	56000	3.5624	0.3704
3.2873	16.6181	57000	3.5548	0.3705
3.3024	16.9097	58000	3.5470	0.3713
3.2379	17.2012	59000	3.5632	0.3707
3.2753	17.4927	60000	3.5568	0.3709
3.2848	17.7843	61000	3.5484	0.3714
3.2023	18.0758	62000	3.5607	0.3713
3.2515	18.3674	63000	3.5598	0.3710
3.2652	18.6589	64000	3.5506	0.3717
3.2867	18.9505	65000	3.5431	0.3719
3.2242	19.2420	66000	3.5621	0.3713
3.2556	19.5336	67000	3.5534	0.3716
3.2738	19.8251	68000	3.5475	0.3721
3.2021	20.1166	69000	3.5616	0.3713
3.2299	20.4082	70000	3.5592	0.3716
3.2573	20.6997	71000	3.5481	0.3723
3.2701	20.9913	72000	3.5409	0.3725
3.2204	21.2828	73000	3.5582	0.3716
3.2229	21.5744	74000	3.5522	0.3720
3.2339	21.8659	75000	3.5419	0.3727
3.1707	22.1574	76000	3.5588	0.3722
3.2119	22.4490	77000	3.5525	0.3723
3.2174	22.7406	78000	3.5479	0.3725
3.1437	23.0321	79000	3.5616	0.3721
3.197	23.3236	80000	3.5559	0.3722
3.2125	23.6152	81000	3.5491	0.3728
3.2511	23.9068	82000	3.5405	0.3731
3.162	24.1983	83000	3.5605	0.3720
3.1887	24.4898	84000	3.5551	0.3724
3.2115	24.7814	85000	3.5455	0.3729
3.1402	25.0729	86000	3.5610	0.3727
3.1789	25.3645	87000	3.5575	0.3722
3.1993	25.6560	88000	3.5498	0.3731
3.218	25.9476	89000	3.5447	0.3734
3.1469	26.2391	90000	3.5599	0.3725
3.1894	26.5306	91000	3.5501	0.3733
3.1931	26.8222	92000	3.5464	0.3733
3.1294	27.1137	93000	3.5635	0.3724
3.1452	27.4053	94000	3.5562	0.3731
3.1861	27.6968	95000	3.5468	0.3732
3.1921	27.9884	96000	3.5444	0.3735
3.1493	28.2799	97000	3.5572	0.3730
3.1537	28.5715	98000	3.5511	0.3730
3.1804	28.8630	99000	3.5434	0.3739
3.1242	29.1545	100000	3.5606	0.3727
3.1423	29.4461	101000	3.5562	0.3731
3.1702	29.7377	102000	3.5504	0.3735

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32