modernbert-CGEdit-AAE_cg_final

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 40

Training Loss	Epoch	Step	Validation Loss
3.6163	1.0	183	0.9028
3.5821	2.0	366	0.8998
3.5796	3.0	549	0.8986
3.6004	4.0	732	0.8979
3.5550	5.0	915	0.8972
3.5406	6.0	1098	0.8960
3.5234	7.0	1281	0.8950
3.5427	8.0	1464	0.8945
3.5038	9.0	1647	0.8941
3.3334	10.0	1830	0.8946
3.5211	11.0	2013	0.8949
3.5114	12.0	2196	0.8940
3.5164	13.0	2379	0.8937
3.5044	14.0	2562	0.8945
3.4915	15.0	2745	0.8938
3.4984	16.0	2928	0.8939
3.4953	17.0	3111	0.8940
3.5025	18.0	3294	0.8943

Safetensors

Model size

0.4B params

Tensor type

F32