Commit ·
46b176e
1
Parent(s): c5be21b
Update README.md
Browse files
README.md
CHANGED
|
@@ -29,4 +29,61 @@ tags:
|
|
| 29 |
- text-simplification
|
| 30 |
- meaning
|
| 31 |
- assess
|
| 32 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
- text-simplification
|
| 30 |
- meaning
|
| 31 |
- assess
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
# Here is MeaningBERT
|
| 35 |
+
|
| 36 |
+
MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was
|
| 37 |
+
proposed in our
|
| 38 |
+
article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full).
|
| 39 |
+
Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity checks. For more details, refer to our publicly available article.
|
| 40 |
+
|
| 41 |
+
## Sanity Check
|
| 42 |
+
|
| 43 |
+
Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
|
| 44 |
+
However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires
|
| 45 |
+
a large dataset
|
| 46 |
+
annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
|
| 47 |
+
identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
|
| 48 |
+
In these tests, the meaning preservation target value is not subjective and does not require human annotation to
|
| 49 |
+
measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
|
| 50 |
+
achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
|
| 51 |
+
compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
|
| 52 |
+
|
| 53 |
+
### Identical sentences
|
| 54 |
+
|
| 55 |
+
The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
|
| 56 |
+
this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
|
| 57 |
+
it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
|
| 58 |
+
for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
|
| 59 |
+
100%.
|
| 60 |
+
|
| 61 |
+
### Unrelated sentences
|
| 62 |
+
|
| 63 |
+
Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
|
| 64 |
+
language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
|
| 65 |
+
irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is 0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
|
| 66 |
+
Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a
|
| 67 |
+
a threshold value of 0%.
|
| 68 |
+
|
| 69 |
+
## Cite
|
| 70 |
+
|
| 71 |
+
Use the following citation to cite MeaningBERT
|
| 72 |
+
|
| 73 |
+
```
|
| 74 |
+
@ARTICLE{10.3389/frai.2023.1223924,
|
| 75 |
+
AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},
|
| 76 |
+
TITLE={MeaningBERT: assessing meaning preservation between sentences},
|
| 77 |
+
JOURNAL={Frontiers in Artificial Intelligence},
|
| 78 |
+
VOLUME={6},
|
| 79 |
+
YEAR={2023},
|
| 80 |
+
URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},
|
| 81 |
+
DOI={10.3389/frai.2023.1223924},
|
| 82 |
+
ISSN={2624-8212},
|
| 83 |
+
}
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
## License
|
| 87 |
+
|
| 88 |
+
MeaningBERT is MIT licensed, as found in
|
| 89 |
+
the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).
|