| | --- |
| | language: |
| | - de |
| | tags: |
| | - medical |
| | - ggponc |
| | widget: |
| | - text: "Vitamin C, E und A" |
| | example_title: "Forward Ellipsis" |
| | - text: "Chemo- und Strahlentherapie" |
| | example_title: "Backward Ellipsis" |
| | - text: "HPV-16- und/oder -18-Positivität" |
| | example_title: "Complex Ellipsis" |
| | --- |
| | |
| | ## Model |
| |
|
| | Fine-tuned [mt5-base](https://huggingface.co/google/mt5-base) model for resolving elliptical coordinated compound noun phrases (ECCNPs) in German text. |
| | ECCNPs are are special type of coordination ellipses, where a part of a compound noun is omitted due to coordination (e.g., "and", "or", "/"). |
| |
|
| | For instance, *Chemo- und Strahlentherapie* (chemo- and radiotherapy) is the elliptical form of *Chemotherapie und Strahlentherapie* (chemotherapy and radiotherapy). |
| |
|
| | ## Dataset |
| |
|
| | The model has been fine-tuned with a subset of sentences of [GGPONC 2.0](https://huggingface.co/datasets/bigbio/ggponc2) containing manually annotated ECCNPs and their resolution. |
| | The annotated dataset is available on Zenodo: https://zenodo.org/records/12529883 |
| |
|
| | ## Usage |
| |
|
| | The model can be loaded as a `Text2TextGenerationPipeline`: |
| |
|
| | ``` |
| | from transformers import pipeline |
| | pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base") |
| | ``` |
| |
|
| | ``` |
| | pipe("Chemo- und Strahlentherapie") |
| | >>> [{'generated_text': 'Chemotherapie und Strahlentherapie'}] |
| | ``` |
| |
|
| | ``` |
| | pipe("Vitamin C, E und A") |
| | >>> [{'generated_text': 'Vitamin C, Vitamin E und Vitamin A'}] |
| | ``` |
| |
|
| | It is recommended to set `max_length` to control the maximum output length. For most German sentences, a value of `256` should be enough: |
| |
|
| | ``` |
| | pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base", max_length=256) |
| | ``` |
| |
|
| |
|
| | ## Paper |
| |
|
| | Our approach and its evaluation have been published at the ACL BioNLP'23 workshop. |
| |
|
| | Please cite the following paper if you find our model useful: |
| |
|
| | ```bibtex |
| | @inproceedings{kammer-etal-2023-resolving, |
| | title = "Resolving Elliptical Compounds in {G}erman Medical Text", |
| | author = "Kammer, Niklas and |
| | Borchert, Florian and |
| | Winkler, Silvia and |
| | de Melo, Gerard and |
| | Schapranow, Matthieu-P.", |
| | editor = "Demner-fushman, Dina and |
| | Ananiadou, Sophia and |
| | Cohen, Kevin", |
| | booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks", |
| | month = jul, |
| | year = "2023", |
| | address = "Toronto, Canada", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://aclanthology.org/2023.bionlp-1.26", |
| | doi = "10.18653/v1/2023.bionlp-1.26", |
| | pages = "292--305" |
| | } |
| | ``` |
| |
|