Instructions to use hmteams/teams-base-historic-multilingual-generator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hmteams/teams-base-historic-multilingual-generator with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="hmteams/teams-base-historic-multilingual-generator")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("hmteams/teams-base-historic-multilingual-generator") model = AutoModelForMaskedLM.from_pretrained("hmteams/teams-base-historic-multilingual-generator") - Notebooks
- Google Colab
- Kaggle
hmTEAMS
Historic Multilingual and Monolingual TEAMS Models. The following languages are covered:
- English (British Library Corpus - Books)
- German (Europeana Newspaper)
- French (Europeana Newspaper)
- Finnish (Europeana Newspaper, Digilib)
- Swedish (Europeana Newspaper, Digilib)
- Dutch (Delpher Corpus)
- Norwegian (NCC Corpus)
Architecture
We pretrain a "Training ELECTRA Augmented with Multi-word Selection" (TEAMS) model:
Results
We perform experiments on various historic NER datasets, such as HIPE-2022 or ICDAR Europeana. All details incl. hyper-parameters can be found here.
Small Benchmark
We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana. The following table shows an overview of used datasets.
| Language | Dataset | Additional Dataset |
|---|---|---|
| English | AjMC | - |
| German | AjMC | - |
| French | AjMC | ICDAR-Europeana |
| Finnish | NewsEye | - |
| Swedish | NewsEye | - |
| Dutch | ICDAR-Europeana | - |
Results
| Model | English AjMC | German AjMC | French AjMC | Finnish NewsEye | Swedish NewsEye | Dutch ICDAR | French ICDAR | Avg. |
|---|---|---|---|---|---|---|---|---|
| hmBERT (32k) Schweter et al. | 85.36 ± 0.94 | 89.08 ± 0.09 | 85.10 ± 0.60 | 77.28 ± 0.37 | 82.85 ± 0.83 | 82.11 ± 0.61 | 77.21 ± 0.16 | 82.71 |
| hmTEAMS (Ours) | 86.41 ± 0.36 | 88.64 ± 0.42 | 85.41 ± 0.67 | 79.27 ± 1.88 | 82.78 ± 0.60 | 88.21 ± 0.39 | 78.03 ± 0.39 | 84.11 |
Release
Our pretrained hmTEAMS model can be obtained from the Hugging Face Model Hub:
Acknowledgements
We thank Luisa März, Katharina Schmid and Erion Çano for their fruitful discussions about Historic Language Models.
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️
- Downloads last month
- 16
