cpr-modernBERT-B

This model is a fine-tuned version of answerdotai/ModernBERT-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
1.0688	0.0477	500	1.0619
1.0418	0.0953	1000	1.0414
1.0315	0.1430	1500	1.0300
1.0189	0.1907	2000	1.0190
1.0195	0.2384	2500	1.0079
1.0010	0.2860	3000	1.0016
1.0009	0.3337	3500	0.9927
0.9853	0.3814	4000	0.9825
0.9861	0.4291	4500	0.9786
0.9783	0.4767	5000	0.9724
0.9628	0.5244	5500	0.9705
0.9623	0.5721	6000	0.9626
0.9552	0.6198	6500	0.9585
0.9527	0.6674	7000	0.9556
0.9566	0.7151	7500	0.9489
0.9527	0.7628	8000	0.9492
0.9488	0.8105	8500	0.9450
0.9489	0.8581	9000	0.9395
0.9355	0.9058	9500	0.9349
0.9336	0.9535	10000	0.9323
0.9388	1.0011	10500	0.9304
0.9243	1.0488	11000	0.9312
0.9246	1.0965	11500	0.9274
0.9183	1.1442	12000	0.9242
0.9167	1.1918	12500	0.9229
0.9184	1.2395	13000	0.9193
0.9181	1.2872	13500	0.9189
0.9142	1.3349	14000	0.9137
0.9120	1.3825	14500	0.9146
0.9137	1.4302	15000	0.9107
0.9075	1.4779	15500	0.9099
0.9020	1.5256	16000	0.9047
0.9021	1.5732	16500	0.9040
0.9017	1.6209	17000	0.9029
0.8984	1.6686	17500	0.9029
0.8944	1.7163	18000	0.9009
0.8982	1.7639	18500	0.8976
0.8957	1.8116	19000	0.8958
0.8901	1.8593	19500	0.8961
0.8867	1.9070	20000	0.8944
0.8929	1.9546	20500	0.8933
0.8941	2.0023	21000	0.8920
0.8847	2.0500	21500	0.8904
0.8904	2.0976	22000	0.8891
0.8822	2.1453	22500	0.8867
0.8848	2.1930	23000	0.8862
0.8825	2.2407	23500	0.8870
0.8817	2.2883	24000	0.8867
0.8755	2.3360	24500	0.8842
0.8770	2.3837	25000	0.8836
0.8798	2.4314	25500	0.8835
0.8801	2.4790	26000	0.8831
0.8837	2.5267	26500	0.8832
0.8797	2.5744	27000	0.8809
0.8750	2.6221	27500	0.8843
0.8744	2.6697	28000	0.8839
0.8776	2.7174	28500	0.8827
0.8800	2.7651	29000	0.8825
0.8749	2.8128	29500	0.8842
0.8789	2.8604	30000	0.8825
0.8699	2.9081	30500	0.8823
0.8785	2.9558	31000	0.8804
0.8844	3.0	31464	0.8815

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model