stcoats
/

de_STTS2_folk_normal_orth

Token Classification

Eval Results (legacy)

Model card Files Files and versions

de_STTS2_folk_normal_orth / README.md

stcoats's picture

Update README.md

eaccf2d verified about 2 years ago

|

history blame contribute delete

2.89 kB

	---
	tags:
	- spacy
	- token-classification
	language:
	- de
	model-index:
	- name: de_STTS2_folk_normal_orth
	results:
	- task:
	name: TAG
	type: token-classification
	metrics:
	- name: TAG (XPOS) Accuracy
	type: accuracy
	value: 0.9379513783
	---
	## de_STTS2_folk_normal_orth tagger

	This is a spaCy language model trained to use the Stuttgart-Tübingen Tagset version 2.0, which was designed to tag transcripts of conversational speech in German.
	The model may be useful for tagging ASR transcripts such as those collected in the [CoGS](https://cc.oulu.fi/~scoats/CoGS.html) corpus.

	The model was trained using the tag annotations from the FOLK corpus at https://agd.ids-mannheim.de/folk-gold.shtml, employing an 80/20 training/test split. This version of the tagger was trained using data in standard German orthography with regards to upper and lower case of characters.

	Usage example:
	```python
	!pip install https://huggingface.co/stcoats/de_STTS2_folk_normal_orth/resolve/main/de_STTS2_folk_normal_orth-any-py3-none-any.whl
	import spacy
	import de_STTS2_folk_normal_orth
	nlp = de_STTS2_folk_normal_orth.load()
	doc = nlp("ach so meinst du wir sollen es jetzt tun")
	for token in doc:
	print(token.text, token.tag_)
	```
	### References

	Coats, Steven. (2023). A new corpus of geolocated ASR transcripts from Germany. <i>Language Resources and Evaluation</i>. https://doi.org/10.1007/s10579-023-09686-9

	Westpfahl, Swantje and Thomas Schmidt. (2016): [FOLK-Gold – A GOLD standard for Part-of-Speech-Tagging of Spoken German](https://aclanthology.org/L16-1237). In: <i>Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia</i>

	---

	\| Feature \| Description \|
	\| --- \| --- \|
	\| Name \| `de_STTS2_folk_normal_orth` \|
	\| Version \| `0.0.1` \|
	\| spaCy \| `>=3.5.1,<3.6.0` \|
	\| Default Pipeline \| `tok2vec`, `tagger` \|
	\| Components \| `tok2vec`, `tagger` \|
	\| Vectors \| 0 keys, 0 unique vectors (0 dimensions) \|
	\| Sources \| n/a \|
	\| License \| n/a \|
	\| Author \| [n/a]() \|

	### Label Scheme

	<details>

	<summary>View label scheme (62 labels for 1 components)</summary>

	\| Component \| Labels \|
	\| --- \| --- \|
	\| `tagger` \| `$.`, `AB`, `ADJA`, `ADJD`, `ADV`, `APPO`, `APPR`, `APPRART`, `APZR`, `ART`, `CARD`, `FM`, `KOKOM`, `KON`, `KOUI`, `KOUS`, `NE`, `NGAKW`, `NGHES`, `NGIRR`, `NGONO`, `NN`, `ORD`, `PDAT`, `PDS`, `PIAT`, `PIDAT`, `PIDS`, `PIS`, `PPER`, `PPOSAT`, `PPOSS`, `PRELAT`, `PRELS`, `PRF`, `PTKA`, `PTKIFG`, `PTKMA`, `PTKMWL`, `PTKNEG`, `PTKVZ`, `PTKZU`, `PWAT`, `PWAV`, `PWS`, `SEDM`, `SEQU`, `SPELL`, `TRUNC`, `UI`, `VAFIN`, `VAIMP`, `VAINF`, `VAPP`, `VMFIN`, `VMINF`, `VVFIN`, `VVIMP`, `VVINF`, `VVIZU`, `VVPP`, `XY` \|

	</details>

	### Accuracy

	\| Type \| Score \|
	\| --- \| --- \|
	\| `TAG_ACC` \| 93.80 \|
	\| `TOK2VEC_LOSS` \| 204127.79 \|
	\| `TAGGER_LOSS` \| 119369.65 \|