100M_high_100_6910

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.3078
Accuracy: 0.3941

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0006
train_batch_size: 32
eval_batch_size: 16
seed: 6910
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.1033	0.1076	1000	5.0180	0.2274
4.5988	0.2153	2000	4.5230	0.2689
4.3355	0.3229	3000	4.2498	0.2975
4.1754	0.4305	4000	4.1035	0.3111
4.0607	0.5382	5000	3.9984	0.3212
4.0022	0.6458	6000	3.9244	0.3273
3.9275	0.7534	7000	3.8683	0.3325
3.8874	0.8610	8000	3.8202	0.3373
3.8623	0.9687	9000	3.7831	0.3403
3.7478	1.0763	10000	3.7542	0.3438
3.7484	1.1839	11000	3.7274	0.3462
3.731	1.2916	12000	3.7030	0.3489
3.7217	1.3992	13000	3.6806	0.3509
3.7223	1.5068	14000	3.6594	0.3529
3.6733	1.6145	15000	3.6423	0.3546
3.6659	1.7221	16000	3.6230	0.3564
3.6659	1.8297	17000	3.6099	0.3582
3.6405	1.9374	18000	3.5942	0.3595
3.5552	2.0450	19000	3.5863	0.3607
3.5517	2.1526	20000	3.5759	0.3619
3.5622	2.2603	21000	3.5652	0.3634
3.5521	2.3679	22000	3.5536	0.3646
3.5272	2.4755	23000	3.5438	0.3653
3.5408	2.5831	24000	3.5349	0.3664
3.5384	2.6908	25000	3.5265	0.3676
3.5453	2.7984	26000	3.5167	0.3680
3.5254	2.9060	27000	3.5096	0.3690
3.4289	3.0137	28000	3.5023	0.3698
3.4573	3.1213	29000	3.5012	0.3706
3.4612	3.2289	30000	3.4919	0.3712
3.4596	3.3366	31000	3.4854	0.3723
3.4761	3.4442	32000	3.4807	0.3726
3.4718	3.5518	33000	3.4740	0.3732
3.4625	3.6595	34000	3.4664	0.3741
3.4482	3.7671	35000	3.4618	0.3749
3.4544	3.8747	36000	3.4568	0.3747
3.4332	3.9823	37000	3.4505	0.3760
3.3729	4.0900	38000	3.4507	0.3762
3.3957	4.1976	39000	3.4467	0.3766
3.4135	4.3052	40000	3.4434	0.3771
3.4065	4.4129	41000	3.4378	0.3779
3.3837	4.5205	42000	3.4326	0.3780
3.3988	4.6281	43000	3.4265	0.3786
3.4003	4.7358	44000	3.4218	0.3791
3.3718	4.8434	45000	3.4193	0.3795
3.3876	4.9510	46000	3.4116	0.3805
3.3072	5.0587	47000	3.4152	0.3806
3.3312	5.1663	48000	3.4141	0.3808
3.33	5.2739	49000	3.4092	0.3810
3.3217	5.3816	50000	3.4062	0.3815
3.3232	5.4892	51000	3.4009	0.3818
3.3414	5.5968	52000	3.3978	0.3823
3.3284	5.7044	53000	3.3928	0.3824
3.3423	5.8121	54000	3.3889	0.3831
3.3375	5.9197	55000	3.3830	0.3838
3.2457	6.0273	56000	3.3864	0.3836
3.2743	6.1350	57000	3.3878	0.3837
3.278	6.2426	58000	3.3834	0.3845
3.2893	6.3502	59000	3.3806	0.3846
3.2858	6.4579	60000	3.3779	0.3851
3.2612	6.5655	61000	3.3731	0.3855
3.2874	6.6731	62000	3.3683	0.3860
3.2913	6.7808	63000	3.3637	0.3862
3.2874	6.8884	64000	3.3623	0.3865
3.3006	6.9960	65000	3.3566	0.3871
3.2113	7.1036	66000	3.3637	0.3868
3.2365	7.2113	67000	3.3613	0.3871
3.2464	7.3189	68000	3.3587	0.3875
3.2246	7.4265	69000	3.3527	0.3879
3.2492	7.5342	70000	3.3493	0.3885
3.2229	7.6418	71000	3.3436	0.3890
3.2475	7.7494	72000	3.3427	0.3891
3.2372	7.8571	73000	3.3380	0.3897
3.256	7.9647	74000	3.3354	0.3900
3.1567	8.0723	75000	3.3419	0.3896
3.1763	8.1800	76000	3.3383	0.3903
3.1703	8.2876	77000	3.3362	0.3905
3.1765	8.3952	78000	3.3342	0.3905
3.1938	8.5029	79000	3.3285	0.3911
3.167	8.6105	80000	3.3272	0.3913
3.208	8.7181	81000	3.3216	0.3917
3.1722	8.8257	82000	3.3193	0.3921
3.1905	8.9334	83000	3.3167	0.3924
3.1372	9.0410	84000	3.3181	0.3925
3.1427	9.1486	85000	3.3179	0.3927
3.1571	9.2563	86000	3.3175	0.3929
3.1387	9.3639	87000	3.3142	0.3933
3.1296	9.4715	88000	3.3115	0.3935
3.1449	9.5792	89000	3.3079	0.3939
3.126	9.6868	90000	3.3078	0.3941
3.1172	9.7944	91000	3.3053	0.3943
3.1328	9.9021	92000	3.3035	0.3944

Framework versions

Transformers 4.47.0.dev0
Pytorch 2.5.0+cu124
Datasets 3.0.2
Tokenizers 0.20.1

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Collection including craa/100M_high_100_6910

exceptions

Collection

Data and models for "Manipulating language models’ training data to study syntactic constraint learning: the case of English passivization" • 49 items • Updated Nov 29, 2025