Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- pytorch
|
| 6 |
+
- BERT
|
| 7 |
+
- token-classification
|
| 8 |
+
- nominalizations
|
| 9 |
+
datasets:
|
| 10 |
+
- kleinay/qanom
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Nominalization Detector
|
| 14 |
+
|
| 15 |
+
This model identifies "predicative nominalizations", that is, nominalizations that carry an eventive (or "verbal") meaning in context. It is a `bert-base-cased` pretrained model, fine-tuned for token classification on top of the "nominalization detection" task as defined and annotated by the QANom project [(Klein et. al., COLING 2020)](https://www.aclweb.org/anthology/2020.coling-main.274/).
|
| 16 |
+
|
| 17 |
+
## Task Description
|
| 18 |
+
|
| 19 |
+
The model is trained as a binary classifier, classifying candidate nominalizations.
|
| 20 |
+
The candidates are extracted using a POS tagger (filtering common nouns) and additionally lexical resources (e.g. WordNet and CatVar), filtering nouns that have (at least one) derivationally-related verb. In the QANom annotation project, these candidates are given to annotators to decide whether they carry a "verbal" meaning in the context of the sentence. The current model reproduces this binary classification.
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
The candidate extraction algorithm is implemented inside the `qanom` package - see the README in the [QANom github repo](https://github.com/kleinay/QANom) for full documentation. The `qanom` package is also available via `pip install qanom`.
|
| 25 |
+
|
| 26 |
+
For ease of use, we encapsulated the full nominalization detection pipeline (i.e. candidate extraction + predicate classification) in the `qanom.nominalization_detector.NominalizationDetector` class, which internally utilize this `nominalization-candidate-classifier`:
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
from qanom.nominalization_detector import NominalizationDetector
|
| 30 |
+
detector = NominalizationDetector()
|
| 31 |
+
|
| 32 |
+
raw_sentences = ["The construction of the officer 's building finished right after the beginning of the destruction of the previous construction ."]
|
| 33 |
+
|
| 34 |
+
print(detector(raw_sentences, return_all_candidates=True))
|
| 35 |
+
print(detector(raw_sentences, threshold=0.75, return_probability=False))
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
Outputs:
|
| 39 |
+
```json
|
| 40 |
+
[[{'predicate_idx': 1,
|
| 41 |
+
'predicate': 'construction',
|
| 42 |
+
'predicate_detector_prediction': True,
|
| 43 |
+
'predicate_detector_probability': 0.7626778483390808,
|
| 44 |
+
'verb_form': 'construct'},
|
| 45 |
+
{'predicate_idx': 4,
|
| 46 |
+
'predicate': 'officer',
|
| 47 |
+
'predicate_detector_prediction': False,
|
| 48 |
+
'predicate_detector_probability': 0.19832570850849152,
|
| 49 |
+
'verb_form': 'officer'},
|
| 50 |
+
{'predicate_idx': 6,
|
| 51 |
+
'predicate': 'building',
|
| 52 |
+
'predicate_detector_prediction': True,
|
| 53 |
+
'predicate_detector_probability': 0.5794129371643066,
|
| 54 |
+
'verb_form': 'build'},
|
| 55 |
+
{'predicate_idx': 11,
|
| 56 |
+
'predicate': 'beginning',
|
| 57 |
+
'predicate_detector_prediction': True,
|
| 58 |
+
'predicate_detector_probability': 0.8937646150588989,
|
| 59 |
+
'verb_form': 'begin'},
|
| 60 |
+
{'predicate_idx': 14,
|
| 61 |
+
'predicate': 'destruction',
|
| 62 |
+
'predicate_detector_prediction': True,
|
| 63 |
+
'predicate_detector_probability': 0.8501205444335938,
|
| 64 |
+
'verb_form': 'destruct'},
|
| 65 |
+
{'predicate_idx': 18,
|
| 66 |
+
'predicate': 'construction',
|
| 67 |
+
'predicate_detector_prediction': True,
|
| 68 |
+
'predicate_detector_probability': 0.7022264003753662,
|
| 69 |
+
'verb_form': 'construct'}]]
|
| 70 |
+
```
|
| 71 |
+
```json
|
| 72 |
+
[[{'predicate_idx': 1, 'predicate': 'construction', 'verb_form': 'construct'},
|
| 73 |
+
{'predicate_idx': 11, 'predicate': 'beginning', 'verb_form': 'begin'},
|
| 74 |
+
{'predicate_idx': 14, 'predicate': 'destruction', 'verb_form': 'destruct'}]]
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## Cite
|
| 78 |
+
|
| 79 |
+
```latex
|
| 80 |
+
@inproceedings{klein2020qanom,
|
| 81 |
+
title={QANom: Question-Answer driven SRL for Nominalizations},
|
| 82 |
+
author={Klein, Ayal and Mamou, Jonathan and Pyatkin, Valentina and Stepanov, Daniela and He, Hangfeng and Roth, Dan and Zettlemoyer, Luke and Dagan, Ido},
|
| 83 |
+
booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
|
| 84 |
+
pages={3069--3083},
|
| 85 |
+
year={2020}
|
| 86 |
+
}
|
| 87 |
+
```
|