t5_sliding_window
This model is a fine-tuned version of t5-base on the nl-quad dataset. It achieves the following results on the evaluation set:
- Loss: 2.0398
- Rougel: 41.87
- Bleu: 18.36
- Meteor: 41.06
- Bert F1: 93.86
- Qsts Mean: 38.69
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 48
- eval_batch_size: 48
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- label_smoothing_factor: 0.05
Training results
| Training Loss | Epoch | Step | Validation Loss | Rougel | Bleu | Meteor | Bert F1 | Qsts Mean |
|---|---|---|---|---|---|---|---|---|
| 2.3550 | 1.0 | 249 | 2.1560 | 38.12 | 14.35 | 36.85 | 93.39 | 33.39 |
| 2.1237 | 2.0 | 498 | 2.0569 | 39.37 | 15.61 | 38.32 | 93.57 | 35.74 |
| 1.9450 | 3.0 | 747 | 2.0384 | 40.24 | 16.47 | 39.36 | 93.63 | 35.7 |
| 1.8462 | 4.0 | 996 | 2.0324 | 40.32 | 16.88 | 39.44 | 93.66 | 37.04 |
| 1.7684 | 5.0 | 1245 | 2.0337 | 40.95 | 17.87 | 40.13 | 93.75 | 38.23 |
| 1.6971 | 6.0 | 1494 | 2.0398 | 41.87 | 18.36 | 41.06 | 93.86 | 38.69 |
Framework versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 142
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Yoga26/t5_sliding_window
Base model
google-t5/t5-baseEvaluation results
- Bleu on nl-quadself-reported18.360