cpr-modernBERT-C

This model is a fine-tuned version of answerdotai/ModernBERT-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
1.0847	0.0478	500	1.0791
1.0631	0.0955	1000	1.0546
1.0360	0.1433	1500	1.0357
1.0323	0.1911	2000	1.0269
1.0190	0.2389	2500	1.0166
1.0142	0.2866	3000	1.0045
0.9938	0.3344	3500	0.9997
0.9956	0.3822	4000	0.9899
0.9850	0.4299	4500	0.9859
0.9697	0.4777	5000	0.9767
0.9751	0.5255	5500	0.9746
0.9626	0.5733	6000	0.9682
0.9609	0.6210	6500	0.9637
0.9569	0.6688	7000	0.9594
0.9582	0.7166	7500	0.9534
0.9545	0.7643	8000	0.9501
0.9457	0.8121	8500	0.9486
0.9437	0.8599	9000	0.9431
0.9444	0.9077	9500	0.9435
0.9429	0.9554	10000	0.9369
0.9386	1.0032	10500	0.9370
0.9348	1.0509	11000	0.9306
0.9282	1.0987	11500	0.9275
0.9263	1.1465	12000	0.9266
0.9235	1.1942	12500	0.9250
0.9192	1.2420	13000	0.9229
0.9208	1.2898	13500	0.9188
0.9186	1.3376	14000	0.9190
0.9195	1.3853	14500	0.9158
0.9095	1.4331	15000	0.9156
0.9135	1.4809	15500	0.9105
0.9095	1.5286	16000	0.9097
0.9045	1.5764	16500	0.9102
0.9130	1.6242	17000	0.9090
0.9057	1.6720	17500	0.9057
0.8996	1.7197	18000	0.9055
0.9005	1.7675	18500	0.9052
0.8959	1.8153	19000	0.9007
0.9017	1.8630	19500	0.8989
0.8990	1.9108	20000	0.9000
0.8935	1.9586	20500	0.8947
0.9007	2.0063	21000	0.8931
0.8921	2.0541	21500	0.8922
0.8845	2.1018	22000	0.8933
0.8859	2.1496	22500	0.8931
0.8802	2.1974	23000	0.8922
0.8847	2.2452	23500	0.8933
0.8841	2.2929	24000	0.8895
0.8844	2.3407	24500	0.8878
0.8920	2.3885	25000	0.8901
0.8806	2.4362	25500	0.8876
0.8761	2.4840	26000	0.8862
0.8860	2.5318	26500	0.8873
0.8819	2.5796	27000	0.8883
0.8732	2.6273	27500	0.8865
0.8787	2.6751	28000	0.8857
0.8831	2.7229	28500	0.8851
0.8773	2.7706	29000	0.8881
0.8761	2.8184	29500	0.8868
0.8747	2.8662	30000	0.8864
0.8809	2.9140	30500	0.8845
0.8857	2.9617	31000	0.8853
0.8795	3.0	31401	0.8875

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model