Legal_GQA_BERT_augmented_17

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
No log	1.0	15	2.7689
No log	2.0	30	2.8738
No log	3.0	45	2.4922
No log	4.0	60	2.6766
No log	5.0	75	2.8174
No log	6.0	90	2.9027
No log	7.0	105	2.8046
No log	8.0	120	3.1146
No log	9.0	135	3.2279
No log	10.0	150	3.3864
No log	11.0	165	3.3745
No log	12.0	180	3.7415
No log	13.0	195	3.6057
No log	14.0	210	3.6076
No log	15.0	225	3.7815
No log	16.0	240	3.6825
No log	17.0	255	3.7231

Safetensors

Model size

0.1B params

Tensor type

F32