flan-t5-base-gec

This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2594
  • Sacrebleu: 83.0881

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Sacrebleu
0.3545 0.2591 1000 0.2928 81.3373
0.3264 0.5181 2000 0.2814 81.8621
0.308 0.7772 3000 0.2722 82.3528
0.2896 1.0363 4000 0.2677 82.6670
0.3001 1.2953 5000 0.2615 82.7959
0.2991 1.5544 6000 0.2612 82.9720
0.2937 1.8135 7000 0.2612 82.8736
0.279 2.0725 8000 0.2595 83.0339
0.2702 2.3316 9000 0.2595 83.1004
0.2685 2.5907 10000 0.2606 83.0513
0.2706 2.8497 11000 0.2594 83.0881

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
2
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lmaccarini/flan-t5-base-gec

Finetuned
(892)
this model