Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,63 @@
|
|
| 1 |
---
|
| 2 |
license: afl-3.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: afl-3.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# Generating Declarative Statements from QA Pairs
|
| 6 |
+
|
| 7 |
+
There are already some rule-based models that can accomplish this task, but I haven't seen any transformer-based models that can do so. Therefore, I trained this model based on `Bart-base` to transform QA pairs into declarative statements.
|
| 8 |
+
|
| 9 |
+
I compared the my model with other rule base models, including
|
| 10 |
+
|
| 11 |
+
> [paper1](https://aclanthology.org/D19-5401.pdf) (2019), which proposes **2 Encoder Pointer-Gen model**
|
| 12 |
+
|
| 13 |
+
and
|
| 14 |
+
|
| 15 |
+
> [paper2](https://arxiv.org/pdf/2112.03849.pdf) (2021), which propose **RBV2 model**
|
| 16 |
+
|
| 17 |
+
**Here are results compared to 2 Encoder Pointer-Gen model (on testset released by paper1)**
|
| 18 |
+
|
| 19 |
+
Test on testset
|
| 20 |
+
|
| 21 |
+
| Model | 2 Encoder Pointer-Gen(2019) | BART-base |
|
| 22 |
+
| ------- | --------------------------- | ---------- |
|
| 23 |
+
| BLEU | 74.05 | **78.878** |
|
| 24 |
+
| ROUGE-1 | 91.24 | **91.937** |
|
| 25 |
+
| ROUGE-2 | 81.91 | **82.177** |
|
| 26 |
+
| ROUGE-L | 86.25 | **87.172** |
|
| 27 |
+
|
| 28 |
+
Test on NewsQA testset
|
| 29 |
+
|
| 30 |
+
| Model | 2 Encoder Pointer-Gen | BART |
|
| 31 |
+
| ------- | --------------------- | ---------- |
|
| 32 |
+
| BLEU | 73.29 | **74.966** |
|
| 33 |
+
| ROUGE-1 | **95.38** | 89.328 |
|
| 34 |
+
| ROUGE-2 | **87.18** | 78.538 |
|
| 35 |
+
| ROUGE-L | **93.65** | 87.583 |
|
| 36 |
+
|
| 37 |
+
Test on free_base testset
|
| 38 |
+
|
| 39 |
+
| Model | 2 Encoder Pointer-Gen | BART |
|
| 40 |
+
| ------- | --------------------- | ---------- |
|
| 41 |
+
| BLEU | 75.41 | **76.082** |
|
| 42 |
+
| ROUGE-1 | **93.46** | 92.693 |
|
| 43 |
+
| ROUGE-2 | **82.29** | 81.216 |
|
| 44 |
+
| ROUGE-L | **87.5** | 86.834 |
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
**As paper2 doesn't release its own dataset, it's hard to make a fair comparison. But according to results in paper2, the Bleu and ROUGE score of their model is lower than that of MPG, which is exactly the 2 Encoder Pointer-Gen model.**
|
| 49 |
+
|
| 50 |
+
| Model | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
|
| 51 |
+
| ------------ | ---- | ------- | ------- | ------- |
|
| 52 |
+
| RBV2 | 74.8 | 95.3 | 83.1 | 90.3 |
|
| 53 |
+
| RBV2+BERT | 71.5 | 93.9 | 82.4 | 89.5 |
|
| 54 |
+
| RBV2+RoBERTa | 72.1 | 94 | 83.1 | 89.8 |
|
| 55 |
+
| RBV2+XLNET | 71.2 | 93.6 | 82.3 | 89.4 |
|
| 56 |
+
| MPG | 75.8 | 94.4 | 87.4 | 91.6 |
|
| 57 |
+
|
| 58 |
+
There are reasons to believe that my model performs better than RBV2.
|
| 59 |
+
|
| 60 |
+
To sum up,my model performs nearly as well as the SOTA rule-based model evaluated with BLEU and ROUGE score. However the sentence pattern is lack of diversity.
|
| 61 |
+
|
| 62 |
+
(It's worth mentioning that even though I tried my best to conduct objective tests, the testsets I could find were more or less different from what they introduced in the paper.)
|
| 63 |
+
|