Middle High German Contextual String Embeddings (Forward)

This model provides contextual string embeddings for Middle High German (MHG), trained on literary and historical texts as part of the digital humanities research surrounding medieval Germanic languages.

It is optimized for Middle High German corpora and serves as a robust general-purpose embedding for downstream NLP tasks involving medieval German.

Data provenance and acknowledgment

The model was trained on on open data prepared by Mittelhochdeutsche Begriffsdatenbank (MHDBDB). Universität Salzburg. Koordination: Katharina Zeppezauer-Wachauer. Seit 1992. URL: http://www.mhdbdb.plus.ac.at/. DOI: 10.60646/MHDBDB).

Model Description

  • Architecture: Character-level LSTM (Flair Language Model)
  • Direction: Forward
  • Data Source: TEI-encoded texts from the Middle High German Conceptual Database (MHDBDB).
  • Training Epochs: 30
  • Final Perplexity: 7.34
  • Final Validation Loss: 1.9934

Usage

To use this model in Flair, install the library (pip install flair) and load the model directly from the Hub. Note that for best results in downstream tasks like NER or Part-of-Speech tagging, it is recommended to use this forward model in combination with a corresponding backward model (mschonhardt/mdh-mhdbdb-backward).

from flair.embeddings import FlairEmbeddings, StackedEmbeddings

# Load the forward model
forward_embeddings = FlairEmbeddings('mschonhardt/mdh-mhdbdb-forward')

# Load the backward model
backward_embeddings = FlairEmbeddings('mschonhardt/mdh-mhdbdb-backward')

# Stack them for best performance
stacked_embeddings = StackedEmbeddings([forward_embeddings, backward_embeddings])

# Example usage
from flair.data import Sentence
sentence = Sentence("von abegescheidenheit ich hân der geschrift vil gelesen")
forward_embeddings.embed(sentence)

Citation

If you use this model, please cite the original research paper as well as the model source.

@software{schonhardt_michael_2026_mhg_flair,
  author       = "Schonhardt, Michael",
  title        = "Middle High German Contextual String Embeddings (Forward): Trained on the MHDBDB TEI-Texte Corpus",
  year         = 2026,
  publisher    = "Zenodo",
  doi="18657493",
  url="https://doi.org/10.5281/zenodo.18657493"
}
@inproceedings{akbik-etal-2018-contextual,
    title = "Contextual String Embeddings for Sequence Labeling",
    author = "Akbik, Alan and
      Blythe, Duncan and
      Vollgraf, Roland",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    year = "2018",
    url = "[https://aclanthology.org/C18-1139/](https://aclanthology.org/C18-1139/)",
    pages = "1638--1649"
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support