exceptions_exp2_swap_take_to_hit_40817

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.5563
Accuracy: 0.3698

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 16
eval_batch_size: 16
seed: 40817
gradient_accumulation_steps: 5
total_train_batch_size: 80
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Accuracy	Validation Loss
4.8218	0.2911	1000	0.2569	4.7375
4.3397	0.5822	2000	0.2994	4.2782
4.1422	0.8733	3000	0.3153	4.0941
3.9996	1.1642	4000	0.3254	3.9856
3.9344	1.4553	5000	0.3324	3.9103
3.882	1.7464	6000	0.3379	3.8516
3.7542	2.0373	7000	0.3418	3.8108
3.742	2.3284	8000	0.3447	3.7814
3.7403	2.6195	9000	0.3478	3.7501
3.7259	2.9106	10000	0.3498	3.7286
3.6355	3.2014	11000	0.3515	3.7126
3.6457	3.4925	12000	0.3535	3.6932
3.6452	3.7837	13000	0.3553	3.6762
3.5385	4.0745	14000	0.3564	3.6688
3.5501	4.3656	15000	0.3574	3.6595
3.5798	4.6567	16000	0.3589	3.6463
3.577	4.9478	17000	0.3597	3.6327
3.496	5.2387	18000	0.3603	3.6337
3.5062	5.5298	19000	0.3612	3.6250
3.5315	5.8209	20000	0.3622	3.6133
3.4436	6.1121	21000	0.3619	3.6228
3.4761	6.4032	22000	0.3629	3.6130
3.4915	6.6943	23000	0.3637	3.6011
3.495	6.9854	24000	0.3647	3.5896
3.4366	7.2763	25000	0.3647	3.5982
3.4459	7.5674	26000	0.3653	3.5904
3.4511	7.8585	27000	0.3661	3.5812
3.3969	8.1493	28000	0.3659	3.5879
3.3997	8.4404	29000	0.3661	3.5841
3.4263	8.7315	30000	0.3670	3.5767
3.3182	9.0224	31000	0.3671	3.5813
3.3885	9.3135	32000	0.3673	3.5788
3.3966	9.6046	33000	0.3680	3.5699
3.411	9.8957	34000	0.3683	3.5635
3.3324	10.1866	35000	0.3681	3.5730
3.3616	10.4777	36000	0.3685	3.5662
3.378	10.7688	37000	0.3693	3.5603
3.2935	11.0597	38000	0.3688	3.5703
3.3295	11.3508	39000	0.3694	3.5664
3.3711	11.6419	40000	0.3698	3.5563
3.3647	11.9330	41000	0.3705	3.5481
3.3018	12.2239	42000	0.3700	3.5656
3.3376	12.5150	43000	0.3704	3.5557
3.3459	12.8061	44000	0.3710	3.5488
3.2828	13.0969	45000	0.3700	3.5641
3.2981	13.3880	46000	0.3703	3.5571
3.3099	13.6791	47000	0.3713	3.5484
3.3467	13.9702	48000	0.3717	3.5411
3.2748	14.2611	49000	0.3708	3.5546
3.2992	14.5522	50000	0.3716	3.5499
3.3182	14.8433	51000	0.3722	3.5429
3.2353	15.1342	52000	0.3713	3.5572
3.2799	15.4253	53000	0.3717	3.5491
3.2976	15.7164	54000	0.3723	3.5429
3.2501	16.0073	55000	0.3720	3.5515
3.2391	16.2984	56000	0.3722	3.5506
3.2764	16.5895	57000	0.3723	3.5449
3.2825	16.8806	58000	0.3730	3.5352
3.2269	17.1715	59000	0.3720	3.5538
3.2552	17.4626	60000	0.3727	3.5474
3.27	17.7537	61000	0.3730	3.5386
3.1842	18.0445	62000	0.3722	3.5523
3.2279	18.3356	63000	0.3726	3.5488
3.2601	18.6267	64000	0.3732	3.5399
3.2762	18.9179	65000	0.3737	3.5333
3.2033	19.2087	66000	0.3726	3.5528
3.231	19.4998	67000	0.3733	3.5431
3.2475	19.7909	68000	0.3736	3.5378
3.1706	20.0818	69000	0.3730	3.5497
3.2155	20.3729	70000	0.3732	3.5467
3.2264	20.6640	71000	0.3740	3.5369
3.2513	20.9551	72000	0.3744	3.5291
3.1817	21.2460	73000	0.3733	3.5490
3.2079	21.5371	74000	0.3738	3.5395
3.2259	21.8282	75000	0.3742	3.5334
3.1668	22.1191	76000	0.3734	3.5500
3.1939	22.4102	77000	0.3737	3.5464
3.2137	22.7013	78000	0.3740	3.5351
3.2233	22.9924	79000	0.3748	3.5296
3.1855	23.2832	80000	0.3739	3.5459
3.2038	23.5743	81000	0.3745	3.5393
3.2297	23.8655	82000	0.3745	3.5326
3.1395	24.1563	83000	0.3738	3.5506
3.1805	24.4474	84000	0.3745	3.5424
3.2038	24.7385	85000	0.3749	3.5342
3.1085	25.0294	86000	0.3743	3.5467
3.168	25.3205	87000	0.3743	3.5448
3.1849	25.6116	88000	0.3747	3.5394
3.2031	25.9027	89000	0.3750	3.5307
3.1408	26.1936	90000	0.3742	3.5550
3.1587	26.4844	91000	3.5487	0.3741
3.17	26.7755	92000	3.5416	0.3745
3.1262	27.0667	93000	3.5523	0.3744
3.1526	27.3578	94000	3.5503	0.3743
3.1821	27.6489	95000	3.5391	0.3750
3.1762	27.9400	96000	3.5318	0.3754
3.127	28.2308	97000	3.5477	0.3745
3.1601	28.5219	98000	3.5417	0.3750
3.1733	28.8131	99000	3.5346	0.3755
3.0859	29.1039	100000	3.5478	0.3748
3.1297	29.3950	101000	3.5470	0.3746
3.1593	29.6861	102000	3.5382	0.3753
3.1782	29.9772	103000	3.5315	0.3755
3.1073	30.2681	104000	3.5477	0.3749
3.1342	30.5592	105000	3.5382	0.3756
3.1468	30.8503	106000	3.5354	0.3756
3.0918	31.1412	107000	3.5516	0.3749
3.1192	31.4323	108000	3.5437	0.3753
3.1459	31.7234	109000	3.5381	0.3756
3.0711	32.0143	110000	3.5476	0.3752

Framework versions

Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32