Middle High German Contextual String Embeddings (Backward)
This model provides contextual string embeddings for Middle High German (MHG), trained on literary and historical texts as part of the digital humanities research surrounding medieval Germanic languages.
It is specifically optimized for Middle High German corpora and serves as a robust general-purpose embedding for downstream NLP tasks involving medieval German. Note that backward models are designed to be used in combination with their forward counterparts.
Data provenance and acknowledgment
The model was trained on open data prepared by Mittelhochdeutsche Begriffsdatenbank (MHDBDB). Universität Salzburg. Koordination: Katharina Zeppezauer-Wachauer. Seit 1992. URL: http://www.mhdbdb.plus.ac.at/. DOI: 10.60646/MHDBDB).
Model Description
- Architecture: Character-level LSTM (Flair Language Model)
- Direction: Backward
- Data Source: TEI-encoded texts from the Middle High German Conceptual Database (MHDBDB).
- Training Epochs: 6 (model was stopped early due to stagnating training gains)
- Final Perplexity: 7.03
- Final Validation Loss: 1.9503
Usage
To use this model in Flair, install the library (pip install flair) and load the model directly from the Hub. For the best performance in sequence labeling (NER, POS), it is recommended to stack this backward model with the corresponding forward model.
from flair.embeddings import FlairEmbeddings, StackedEmbeddings
# Load the backward model
backward_embeddings = FlairEmbeddings('mschonhardt/mhg-mhdbdb-backward')
# Load the forward model for a bidirectional setup
forward_embeddings = FlairEmbeddings('mschonhardt/mhg-mhdbdb-forward')
# Stack them for best performance
stacked_embeddings = StackedEmbeddings([forward_embeddings, backward_embeddings])
# Example usage
from flair.data import Sentence
sentence = Sentence("von abegescheidenheit ich hân der geschrift vil gelesen")
stacked_embeddings.embed(sentence)
Citation
If you use this model, please cite the original research paper as well as the model source.
@software{schonhardt_michael_2026_mhg_flair_back,
author = "Schonhardt, Michael",
title = "Middle High German Contextual String Embeddings (Backward): Trained on the MHDBDB TEI-Texte Corpus",
year = 2026,
publisher = "Zenodo",
doi = "10.5281/zenodo.18659437",
url = "https://doi.org/10.5281/zenodo.18659437"
}
@inproceedings{akbik-etal-2018-contextual,
title = "Contextual String Embeddings for Sequence Labeling",
author = "Akbik, Alan and
Blythe, Duncan and
Vollgraf, Roland",
booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
year = "2018",
url = "[https://aclanthology.org/C18-1139/](https://aclanthology.org/C18-1139/)",
pages = "1638--1649"
}
- Downloads last month
- -