a65d9c4adf27447664d82dde48cc24a9
This model is a fine-tuned version of studio-ousia/luke-large on the dim/tldr_news dataset. It achieves the following results on the evaluation set:
- Loss: 1.0307
- Data Size: 1.0
- Epoch Runtime: 41.2299
- Accuracy: 0.7777
- F1 Macro: 0.8141
- Rouge1: 0.7784
- Rouge2: 0.0
- Rougel: 0.7784
- Rougelsum: 0.7784
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro | Rouge1 | Rouge2 | Rougel | Rougelsum |
|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.5650 | 0 | 3.2521 | 0.2777 | 0.1115 | 0.2770 | 0.0 | 0.2770 | 0.2777 |
| No log | 1 | 178 | 1.5858 | 0.0078 | 4.1440 | 0.2862 | 0.1376 | 0.2869 | 0.0 | 0.2862 | 0.2862 |
| No log | 2 | 356 | 1.2355 | 0.0156 | 4.7329 | 0.5185 | 0.3133 | 0.5192 | 0.0 | 0.5192 | 0.5185 |
| No log | 3 | 534 | 0.8361 | 0.0312 | 5.9672 | 0.6612 | 0.5153 | 0.6612 | 0.0 | 0.6619 | 0.6605 |
| No log | 4 | 712 | 0.9015 | 0.0625 | 7.7237 | 0.7145 | 0.5549 | 0.7152 | 0.0 | 0.7152 | 0.7145 |
| No log | 5 | 890 | 0.8214 | 0.125 | 10.9602 | 0.7322 | 0.5678 | 0.7330 | 0.0 | 0.7330 | 0.7322 |
| 0.0544 | 6 | 1068 | 0.6937 | 0.25 | 15.6494 | 0.7415 | 0.7021 | 0.7422 | 0.0 | 0.7422 | 0.7415 |
| 0.5963 | 7 | 1246 | 0.7073 | 0.5 | 23.6016 | 0.7244 | 0.6791 | 0.7251 | 0.0 | 0.7251 | 0.7244 |
| 0.5197 | 8.0 | 1424 | 0.6662 | 1.0 | 43.3845 | 0.7635 | 0.7399 | 0.7642 | 0.0 | 0.7642 | 0.7635 |
| 0.4196 | 9.0 | 1602 | 0.6533 | 1.0 | 40.7272 | 0.7756 | 0.7870 | 0.7770 | 0.0 | 0.7763 | 0.7756 |
| 0.328 | 10.0 | 1780 | 0.7369 | 1.0 | 41.2215 | 0.7607 | 0.7993 | 0.7614 | 0.0 | 0.7607 | 0.7607 |
| 0.2367 | 11.0 | 1958 | 1.0076 | 1.0 | 41.9694 | 0.7031 | 0.7347 | 0.7031 | 0.0 | 0.7031 | 0.7038 |
| 0.1884 | 12.0 | 2136 | 1.0821 | 1.0 | 40.9372 | 0.7678 | 0.8077 | 0.7678 | 0.0 | 0.7685 | 0.7685 |
| 0.2027 | 13.0 | 2314 | 1.0307 | 1.0 | 41.2299 | 0.7777 | 0.8141 | 0.7784 | 0.0 | 0.7784 | 0.7784 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 10
Model tree for contemmcm/a65d9c4adf27447664d82dde48cc24a9
Base model
studio-ousia/luke-large