Update README.md
Browse filesExtend model card.
README.md
CHANGED
|
@@ -7,11 +7,31 @@ language: de
|
|
| 7 |
datasets:
|
| 8 |
- conll2003
|
| 9 |
- germeval_14
|
| 10 |
-
- europeananewspapers2016
|
| 11 |
license: apache-2.0
|
| 12 |
---
|
|
|
|
|
|
|
| 13 |
This is a BERT model for named entity recognition in historical German.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
|
|
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
datasets:
|
| 8 |
- conll2003
|
| 9 |
- germeval_14
|
|
|
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
+
# About `sbb_ner`
|
| 13 |
+
|
| 14 |
This is a BERT model for named entity recognition in historical German.
|
| 15 |
+
It can predict the classes `PER`, `LOC` and `ORG`.
|
| 16 |
+
|
| 17 |
+
The model is based on the 🤗 [Transformers](https://github.com/huggingface/transformers)
|
| 18 |
+
`BERT Base multi-lingual cased` model.
|
| 19 |
+
We applied unsupervised pre-training on 2,333,647 pages of
|
| 20 |
+
unlabeled historical German text from the Berlin State Library
|
| 21 |
+
digital collections, and supervised pre-training on two datasets
|
| 22 |
+
with contemporary German text, [conll2003](https://huggingface.co/models?dataset=dataset:conll2003)
|
| 23 |
+
and [germeval_14](https://huggingface.co/models?dataset=dataset:germeval_14).
|
| 24 |
+
|
| 25 |
+
# Results
|
| 26 |
+
|
| 27 |
+
In a 5-fold cross validation with different historical German NER corpora,
|
| 28 |
+
the model obtained an F1-Score of **84.3**±1.1%.
|
| 29 |
|
| 30 |
+
For details, see our [paper](https://corpora.linguistik.uni-erlangen.de/data/konvens/proceedings/papers/KONVENS2019_paper_4.pdf)
|
| 31 |
+
or have a look at [sbb_ner](https://github.com/qurator-spk/sbb_ner) on GitHub.
|
| 32 |
|
| 33 |
+
# Weights
|
| 34 |
+
We provide model weights for PyTorch.
|
| 35 |
+
| Model | Downloads
|
| 36 |
+
| ------------------------| ------------------------
|
| 37 |
+
| `bert-sbb-de-finetuned` | [`config.json`](https://huggingface.co/SBB/sbb_ner/blob/main/config.json) • [`pytorch_model_ep7.bin`](https://huggingface.co/SBB/sbb_ner/blob/main/pytorch_model_ep7.bin) • [`vocab.txt`](https://huggingface.co/SBB/sbb_ner/blob/main/vocab.txt)
|