Legal_GQA_BERT_augmented_17

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7231

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 17

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 15 2.7689
No log 2.0 30 2.8738
No log 3.0 45 2.4922
No log 4.0 60 2.6766
No log 5.0 75 2.8174
No log 6.0 90 2.9027
No log 7.0 105 2.8046
No log 8.0 120 3.1146
No log 9.0 135 3.2279
No log 10.0 150 3.3864
No log 11.0 165 3.3745
No log 12.0 180 3.7415
No log 13.0 195 3.6057
No log 14.0 210 3.6076
No log 15.0 225 3.7815
No log 16.0 240 3.6825
No log 17.0 255 3.7231

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support