---
tags:
- spacy
- token-classification
language:
- de
model-index:
- name: de_STTS2_folk_normal_orth
results:
- task:
name: TAG
type: token-classification
metrics:
- name: TAG (XPOS) Accuracy
type: accuracy
value: 0.9379513783
---
## de_STTS2_folk_normal_orth tagger
This is a spaCy language model trained to use the Stuttgart-Tübingen Tagset version 2.0, which was designed to tag transcripts of conversational speech in German.
The model may be useful for tagging ASR transcripts such as those collected in the [CoGS](https://cc.oulu.fi/~scoats/CoGS.html) corpus.
The model was trained using the tag annotations from the FOLK corpus at https://agd.ids-mannheim.de/folk-gold.shtml, employing an 80/20 training/test split. This version of the tagger was trained using data in standard German orthography with regards to upper and lower case of characters.
Usage example:
```python
!pip install https://huggingface.co/stcoats/de_STTS2_folk_normal_orth/resolve/main/de_STTS2_folk_normal_orth-any-py3-none-any.whl
import spacy
import de_STTS2_folk_normal_orth
nlp = de_STTS2_folk_normal_orth.load()
doc = nlp("ach so meinst du wir sollen es jetzt tun")
for token in doc:
print(token.text, token.tag_)
```
### References
Coats, Steven. (2023). A new corpus of geolocated ASR transcripts from Germany. Language Resources and Evaluation. https://doi.org/10.1007/s10579-023-09686-9
Westpfahl, Swantje and Thomas Schmidt. (2016): [FOLK-Gold – A GOLD standard for Part-of-Speech-Tagging of Spoken German](https://aclanthology.org/L16-1237). In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia
---
| Feature | Description |
| --- | --- |
| **Name** | `de_STTS2_folk_normal_orth` |
| **Version** | `0.0.1` |
| **spaCy** | `>=3.5.1,<3.6.0` |
| **Default Pipeline** | `tok2vec`, `tagger` |
| **Components** | `tok2vec`, `tagger` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** | n/a |
| **Author** | [n/a]() |
### Label Scheme
View label scheme (62 labels for 1 components)
| Component | Labels |
| --- | --- |
| **`tagger`** | `$.`, `AB`, `ADJA`, `ADJD`, `ADV`, `APPO`, `APPR`, `APPRART`, `APZR`, `ART`, `CARD`, `FM`, `KOKOM`, `KON`, `KOUI`, `KOUS`, `NE`, `NGAKW`, `NGHES`, `NGIRR`, `NGONO`, `NN`, `ORD`, `PDAT`, `PDS`, `PIAT`, `PIDAT`, `PIDS`, `PIS`, `PPER`, `PPOSAT`, `PPOSS`, `PRELAT`, `PRELS`, `PRF`, `PTKA`, `PTKIFG`, `PTKMA`, `PTKMWL`, `PTKNEG`, `PTKVZ`, `PTKZU`, `PWAT`, `PWAV`, `PWS`, `SEDM`, `SEQU`, `SPELL`, `TRUNC`, `UI`, `VAFIN`, `VAIMP`, `VAINF`, `VAPP`, `VMFIN`, `VMINF`, `VVFIN`, `VVIMP`, `VVINF`, `VVIZU`, `VVPP`, `XY` |
### Accuracy
| Type | Score |
| --- | --- |
| `TAG_ACC` | 93.80 |
| `TOK2VEC_LOSS` | 204127.79 |
| `TAGGER_LOSS` | 119369.65 |