babylm-base5m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 190
training_steps: 19000

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.9727	0.1115	200	6.4997	0.1008
6.1573	0.2230	400	6.1957	0.1279
6.0466	0.3344	600	6.0624	0.1349
5.9445	0.4459	800	5.9957	0.1421
5.8652	0.5574	1000	5.9700	0.1420
5.8829	0.6689	1200	5.9288	0.1455
5.8271	0.7804	1400	5.8999	0.1463
5.7827	0.8919	1600	5.8777	0.1487
5.753	1.0033	1800	5.8681	0.1503
5.7464	1.1148	2000	5.8388	0.1526
5.6598	2.2297	4000	5.7516	0.1584
5.623	3.3445	6000	5.7116	0.1596
5.5264	4.4593	8000	5.6755	0.1623
5.5045	5.5741	10000	5.6489	0.1636
5.4957	6.6890	12000	5.6193	0.1669
5.4748	7.8038	14000	5.5932	0.1670
5.4256	8.9186	16000	5.5806	0.1674
5.3906	10.0334	18000	5.5696	0.1680

Safetensors

Model size

98.6M params

Tensor type

F32