--- tags: - spacy - token-classification language: - de model-index: - name: de_STTS2_folk_normal_orth results: - task: name: TAG type: token-classification metrics: - name: TAG (XPOS) Accuracy type: accuracy value: 0.9379513783 --- ## de_STTS2_folk_normal_orth tagger This is a spaCy language model trained to use the Stuttgart-Tübingen Tagset version 2.0, which was designed to tag transcripts of conversational speech in German. The model may be useful for tagging ASR transcripts such as those collected in the [CoGS](https://cc.oulu.fi/~scoats/CoGS.html) corpus. The model was trained using the tag annotations from the FOLK corpus at https://agd.ids-mannheim.de/folk-gold.shtml, employing an 80/20 training/test split. This version of the tagger was trained using data in standard German orthography with regards to upper and lower case of characters. Usage example: ```python !pip install https://huggingface.co/stcoats/de_STTS2_folk_normal_orth/resolve/main/de_STTS2_folk_normal_orth-any-py3-none-any.whl import spacy import de_STTS2_folk_normal_orth nlp = de_STTS2_folk_normal_orth.load() doc = nlp("ach so meinst du wir sollen es jetzt tun") for token in doc: print(token.text, token.tag_) ``` ### References Coats, Steven. (2023). A new corpus of geolocated ASR transcripts from Germany. Language Resources and Evaluation. https://doi.org/10.1007/s10579-023-09686-9 Westpfahl, Swantje and Thomas Schmidt. (2016): [FOLK-Gold – A GOLD standard for Part-of-Speech-Tagging of Spoken German](https://aclanthology.org/L16-1237). In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia --- | Feature | Description | | --- | --- | | **Name** | `de_STTS2_folk_normal_orth` | | **Version** | `0.0.1` | | **spaCy** | `>=3.5.1,<3.6.0` | | **Default Pipeline** | `tok2vec`, `tagger` | | **Components** | `tok2vec`, `tagger` | | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | | **Sources** | n/a | | **License** | n/a | | **Author** | [n/a]() | ### Label Scheme

View label scheme (62 labels for 1 components)

| Component | Labels | | --- | --- | | **`tagger`** | `$.`, `AB`, `ADJA`, `ADJD`, `ADV`, `APPO`, `APPR`, `APPRART`, `APZR`, `ART`, `CARD`, `FM`, `KOKOM`, `KON`, `KOUI`, `KOUS`, `NE`, `NGAKW`, `NGHES`, `NGIRR`, `NGONO`, `NN`, `ORD`, `PDAT`, `PDS`, `PIAT`, `PIDAT`, `PIDS`, `PIS`, `PPER`, `PPOSAT`, `PPOSS`, `PRELAT`, `PRELS`, `PRF`, `PTKA`, `PTKIFG`, `PTKMA`, `PTKMWL`, `PTKNEG`, `PTKVZ`, `PTKZU`, `PWAT`, `PWAV`, `PWS`, `SEDM`, `SEQU`, `SPELL`, `TRUNC`, `UI`, `VAFIN`, `VAIMP`, `VAINF`, `VAPP`, `VMFIN`, `VMINF`, `VVFIN`, `VVIMP`, `VVINF`, `VVIZU`, `VVPP`, `XY` |

### Accuracy | Type | Score | | --- | --- | | `TAG_ACC` | 93.80 | | `TOK2VEC_LOSS` | 204127.79 | | `TAGGER_LOSS` | 119369.65 |