phlobo's picture
Update README.md
9cc4bb4 verified
---
language:
- de
tags:
- medical
- ggponc
widget:
- text: "Vitamin C, E und A"
example_title: "Forward Ellipsis"
- text: "Chemo- und Strahlentherapie"
example_title: "Backward Ellipsis"
- text: "HPV-16- und/oder -18-Positivität"
example_title: "Complex Ellipsis"
---
## Model
Fine-tuned [mt5-base](https://huggingface.co/google/mt5-base) model for resolving elliptical coordinated compound noun phrases (ECCNPs) in German text.
ECCNPs are are special type of coordination ellipses, where a part of a compound noun is omitted due to coordination (e.g., "and", "or", "/").
For instance, *Chemo- und Strahlentherapie* (chemo- and radiotherapy) is the elliptical form of *Chemotherapie und Strahlentherapie* (chemotherapy and radiotherapy).
## Dataset
The model has been fine-tuned with a subset of sentences of [GGPONC 2.0](https://huggingface.co/datasets/bigbio/ggponc2) containing manually annotated ECCNPs and their resolution.
The annotated dataset is available on Zenodo: https://zenodo.org/records/12529883
## Usage
The model can be loaded as a `Text2TextGenerationPipeline`:
```
from transformers import pipeline
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base")
```
```
pipe("Chemo- und Strahlentherapie")
>>> [{'generated_text': 'Chemotherapie und Strahlentherapie'}]
```
```
pipe("Vitamin C, E und A")
>>> [{'generated_text': 'Vitamin C, Vitamin E und Vitamin A'}]
```
It is recommended to set `max_length` to control the maximum output length. For most German sentences, a value of `256` should be enough:
```
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base", max_length=256)
```
## Paper
Our approach and its evaluation have been published at the ACL BioNLP'23 workshop.
Please cite the following paper if you find our model useful:
```bibtex
@inproceedings{kammer-etal-2023-resolving,
title = "Resolving Elliptical Compounds in {G}erman Medical Text",
author = "Kammer, Niklas and
Borchert, Florian and
Winkler, Silvia and
de Melo, Gerard and
Schapranow, Matthieu-P.",
editor = "Demner-fushman, Dina and
Ananiadou, Sophia and
Cohen, Kevin",
booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.bionlp-1.26",
doi = "10.18653/v1/2023.bionlp-1.26",
pages = "292--305"
}
```