ltg_norbert3-small

This model is a fine-tuned version of ltg/norbert3-small on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
10.4429	0.1953	100	4.6021
3.1755	0.3906	200	1.4491
1.6651	0.5859	300	0.7882
1.2191	0.7812	400	0.5250
0.877	0.9766	500	0.4177
0.732	1.1719	600	0.3365
0.6434	1.3672	700	0.2867
0.5027	1.5625	800	0.2396
0.4817	1.7578	900	0.2127
0.3954	1.9531	1000	0.1937
0.4317	2.1484	1100	0.1797
0.3684	2.3438	1200	0.1609
0.3167	2.5391	1300	0.1501
0.3312	2.7344	1400	0.1376
0.2785	2.9297	1500	0.1256
0.2512	3.125	1600	0.1191
0.2148	3.3203	1700	0.1126
0.2264	3.5156	1800	0.1028
0.2091	3.7109	1900	0.0963
0.1851	3.9062	2000	0.0894
0.1742	4.1016	2100	0.0844
0.1432	4.2969	2200	0.0805
0.1869	4.4922	2300	0.0765
0.1539	4.6875	2400	0.0688
0.1412	4.8828	2500	0.0682
0.1231	5.0781	2600	0.0633
0.1723	5.2734	2700	0.0570
0.1437	5.4688	2800	0.0560
0.1252	5.6641	2900	0.0499
0.1052	5.8594	3000	0.0455
0.1058	6.0547	3100	0.0482
0.0692	6.25	3200	0.0413
0.1163	6.4453	3300	0.0398
0.0755	6.6406	3400	0.0358
0.0922	6.8359	3500	0.0335
0.0877	7.0312	3600	0.0329
0.0744	7.2266	3700	0.0315
0.0604	7.4219	3800	0.0280
0.068	7.6172	3900	0.0285
0.0663	7.8125	4000	0.0260
0.0597	8.0078	4100	0.0264
0.0566	8.2031	4200	0.0258
0.0658	8.3984	4300	0.0237
0.052	8.5938	4400	0.0232
0.048	8.7891	4500	0.0247
0.0588	8.9844	4600	0.0222
0.0406	9.1797	4700	0.0231
0.0527	9.375	4800	0.0211
0.0516	9.5703	4900	0.0207
0.0378	9.7656	5000	0.0213
0.0506	9.9609	5100	0.0226

Safetensors

Model size

60M params

Tensor type

F32

Base model

Finetuned

(3)

this model