bert-trainer-8b

This model is a fine-tuned version of on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1639

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 32
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
6.5416 1.0 500 6.5207
6.393 1.99 1000 6.3903
6.2817 2.99 1500 6.3033
6.2274 3.98 2000 6.2671
6.179 4.98 2500 6.2431
6.1684 5.98 3000 6.2309
6.1244 6.97 3500 6.2114
6.0879 7.97 4000 6.1932
6.0643 8.96 4500 6.1791
6.0481 9.96 5000 6.1638
6.0231 10.96 5500 6.1581
5.9987 11.95 6000 6.1365
5.9989 12.95 6500 6.1194
5.9535 13.94 7000 6.1095
5.9139 14.94 7500 6.0890
5.8462 15.94 8000 6.0224
5.7689 16.93 8500 5.9266
5.6137 17.93 9000 5.7195
4.7163 18.92 9500 4.6131
4.0877 19.92 10000 4.0903
3.7832 20.92 10500 3.8340
3.6104 21.91 11000 3.6572
3.4615 22.91 11500 3.5278
3.3661 23.9 12000 3.4201
3.271 24.9 12500 3.3333
3.2179 25.9 13000 3.2720
3.1759 26.89 13500 3.2317
3.1419 27.89 14000 3.2006
3.1041 28.88 14500 3.1806
3.0836 29.88 15000 3.1693
3.0998 30.88 15500 3.1679
3.08 31.87 16000 3.1639

Framework versions

  • Transformers 4.26.1
  • Pytorch 1.13.1
  • Datasets 2.9.0
  • Tokenizers 0.13.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support