Legal_GQA_BERT7

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7785

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 16

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 5 3.9923
No log 2.0 10 3.5716
No log 3.0 15 3.4393
No log 4.0 20 3.4841
No log 5.0 25 3.5053
No log 6.0 30 3.6398
No log 7.0 35 3.9208
No log 8.0 40 4.0914
No log 9.0 45 4.2757
No log 10.0 50 4.3265
No log 11.0 55 4.4341
No log 12.0 60 4.6095
No log 13.0 65 4.7127
No log 14.0 70 4.7583
No log 15.0 75 4.7729
No log 16.0 80 4.7785

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support