t5_sliding_window

This model is a fine-tuned version of t5-base on the nl-quad dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0398
  • Rougel: 41.87
  • Bleu: 18.36
  • Meteor: 41.06
  • Bert F1: 93.86
  • Qsts Mean: 38.69

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • label_smoothing_factor: 0.05

Training results

Training Loss Epoch Step Validation Loss Rougel Bleu Meteor Bert F1 Qsts Mean
2.3550 1.0 249 2.1560 38.12 14.35 36.85 93.39 33.39
2.1237 2.0 498 2.0569 39.37 15.61 38.32 93.57 35.74
1.9450 3.0 747 2.0384 40.24 16.47 39.36 93.63 35.7
1.8462 4.0 996 2.0324 40.32 16.88 39.44 93.66 37.04
1.7684 5.0 1245 2.0337 40.95 17.87 40.13 93.75 38.23
1.6971 6.0 1494 2.0398 41.87 18.36 41.06 93.86 38.69

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
142
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yoga26/t5_sliding_window

Finetuned
(721)
this model

Evaluation results