Model save
Browse files
README.md
CHANGED
|
@@ -6,6 +6,7 @@ tags:
|
|
| 6 |
- generated_from_trainer
|
| 7 |
metrics:
|
| 8 |
- rouge
|
|
|
|
| 9 |
model-index:
|
| 10 |
- name: flan-context
|
| 11 |
results: []
|
|
@@ -18,17 +19,13 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 18 |
|
| 19 |
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the None dataset.
|
| 20 |
It achieves the following results on the evaluation set:
|
| 21 |
-
- Loss:
|
| 22 |
-
- Rouge: {'rouge1': 0, 'rouge2': 0, 'rougeL': 0, 'rougeLsum': 0}
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
- Meteor: 0
|
| 28 |
-
- Bertscore Precision: 0
|
| 29 |
-
- Bertscore Recall: 0
|
| 30 |
-
- Bertscore F1: 0
|
| 31 |
-
- Gen Len: 0
|
| 32 |
|
| 33 |
## Model description
|
| 34 |
|
|
@@ -51,16 +48,21 @@ The following hyperparameters were used during training:
|
|
| 51 |
- train_batch_size: 4
|
| 52 |
- eval_batch_size: 4
|
| 53 |
- seed: 42
|
|
|
|
|
|
|
| 54 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 55 |
- lr_scheduler_type: linear
|
| 56 |
-
- num_epochs:
|
| 57 |
|
| 58 |
### Training results
|
| 59 |
|
| 60 |
-
| Training Loss | Epoch
|
| 61 |
-
|
| 62 |
-
|
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
|
| 66 |
### Framework versions
|
|
|
|
| 6 |
- generated_from_trainer
|
| 7 |
metrics:
|
| 8 |
- rouge
|
| 9 |
+
- bleu
|
| 10 |
model-index:
|
| 11 |
- name: flan-context
|
| 12 |
results: []
|
|
|
|
| 19 |
|
| 20 |
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the None dataset.
|
| 21 |
It achieves the following results on the evaluation set:
|
| 22 |
+
- Loss: 2.3842
|
| 23 |
+
- Rouge: {'rouge1': 0.23405466769402272, 'rouge2': 0.07972272627141436, 'rougeL': 0.19916876363600983, 'rougeLsum': 0.19933848643732807}
|
| 24 |
+
- Bleu: {'bleu': 0.02684649083752067, 'precisions': [0.36613819922225543, 0.11508951406649616, 0.051836594576038446, 0.02742772424017791], 'brevity_penalty': 0.3051481683964344, 'length_ratio': 0.4572561893037888, 'translation_length': 3343, 'reference_length': 7311}
|
| 25 |
+
- Bertscore Precision: 0.8858
|
| 26 |
+
- Bertscore Recall: 0.8612
|
| 27 |
+
- Bertscore F1: 0.8732
|
| 28 |
+
- Meteor: 0.1631
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Model description
|
| 31 |
|
|
|
|
| 48 |
- train_batch_size: 4
|
| 49 |
- eval_batch_size: 4
|
| 50 |
- seed: 42
|
| 51 |
+
- gradient_accumulation_steps: 2
|
| 52 |
+
- total_train_batch_size: 8
|
| 53 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 54 |
- lr_scheduler_type: linear
|
| 55 |
+
- num_epochs: 5
|
| 56 |
|
| 57 |
### Training results
|
| 58 |
|
| 59 |
+
| Training Loss | Epoch | Step | Validation Loss | Rouge | Bleu | Bertscore Precision | Bertscore Recall | Bertscore F1 | Meteor |
|
| 60 |
+
|:-------------:|:------:|:----:|:---------------:|:-------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------:|:----------------:|:------------:|:------:|
|
| 61 |
+
| 2.9048 | 0.9973 | 188 | 2.4790 | {'rouge1': 0.20968553648391508, 'rouge2': 0.07443896925573865, 'rougeL': 0.1805792844016589, 'rougeLsum': 0.18088539253560337} | {'bleu': 0.020489842222462733, 'precisions': [0.38257575757575757, 0.11974711788769059, 0.05699272433306386, 0.029216467463479414], 'brevity_penalty': 0.21924576067972068, 'length_ratio': 0.3972096840377513, 'translation_length': 2904, 'reference_length': 7311} | 0.8919 | 0.8585 | 0.8747 | 0.1489 |
|
| 62 |
+
| 2.5945 | 2.0 | 377 | 2.4208 | {'rouge1': 0.2297957739195002, 'rouge2': 0.07836393682592077, 'rougeL': 0.19574285931586738, 'rougeLsum': 0.19611644340511103} | {'bleu': 0.0253671186540934, 'precisions': [0.381875, 0.11792294807370184, 0.054873646209386284, 0.02857142857142857], 'brevity_penalty': 0.2767370504490949, 'length_ratio': 0.4376966215292026, 'translation_length': 3200, 'reference_length': 7311} | 0.8902 | 0.8617 | 0.8756 | 0.1597 |
|
| 63 |
+
| 2.4452 | 2.9973 | 565 | 2.3912 | {'rouge1': 0.22841476315147372, 'rouge2': 0.07651572809303378, 'rougeL': 0.19569000397215158, 'rougeLsum': 0.195641894488224} | {'bleu': 0.026971694358366844, 'precisions': [0.36859553948161544, 0.11666129552046407, 0.05332409972299169, 0.02843247287691732], 'brevity_penalty': 0.30016114315032577, 'length_ratio': 0.45383668444809194, 'translation_length': 3318, 'reference_length': 7311} | 0.8862 | 0.8603 | 0.8729 | 0.1620 |
|
| 64 |
+
| 2.3204 | 4.0 | 754 | 2.3840 | {'rouge1': 0.2324717619677979, 'rouge2': 0.07856325718853738, 'rougeL': 0.19781179401099636, 'rougeLsum': 0.1979068896295354} | {'bleu': 0.025569078512862678, 'precisions': [0.3670581150255947, 0.11332904056664521, 0.050155655482531994, 0.025037369207772796], 'brevity_penalty': 0.3007591959750055, 'length_ratio': 0.4542470250307755, 'translation_length': 3321, 'reference_length': 7311} | 0.8857 | 0.8609 | 0.8730 | 0.1624 |
|
| 65 |
+
| 2.2574 | 4.9867 | 940 | 2.3842 | {'rouge1': 0.23405466769402272, 'rouge2': 0.07972272627141436, 'rougeL': 0.19916876363600983, 'rougeLsum': 0.19933848643732807} | {'bleu': 0.02684649083752067, 'precisions': [0.36613819922225543, 0.11508951406649616, 0.051836594576038446, 0.02742772424017791], 'brevity_penalty': 0.3051481683964344, 'length_ratio': 0.4572561893037888, 'translation_length': 3343, 'reference_length': 7311} | 0.8858 | 0.8612 | 0.8732 | 0.1631 |
|
| 66 |
|
| 67 |
|
| 68 |
### Framework versions
|