Latin Contextual POS Tagger (Flair)

This model is a Part-of-Speech (POS) tagger for Latin, specifically optimized for medieval and early modern legal texts. It uses a Bi-LSTM-CRF architecture based on domain-specific contextual string embeddings.

The model was developed as part of the projects "Embedding the Past" (LOEWE-Exploration, TU Darmstadt) and "Burchards Dekret Digital" (Langzeitvorhaben, Akademie der Wissenschaften und der Literatur | Mainz).

Technical Details

Architecture: Bi-LSTM + CRF Sequence Tagger.
Hidden Size: 1024 (2 layers).
Base Embeddings: Stacked Latin Legal Forward and Backward contextual string embeddings.
Data Source: Corpus of ~1.59M training sentences from medieval texts.
Accuracy: 95.88% (Micro F1-score / Accuracy).

Data Source and Acknowledgements

We gratefully acknowledge that the training data originates from the Latin Text Archive (LTA) (Prof. Dr. Bernhard Jussen, Dr. Tim Geelhaar) including data from Monumenta Germaniae Historica, Corpus Corporum and IRHT.

Performance Metrics

Results:

F-score (micro) 0.9588
F-score (macro) 0.9397
Accuracy 0.9588

By class: precision recall f1-score support

    NOUN     0.9444    0.9480    0.9462   1036164
   PUNCT     0.9999    1.0000    1.0000    831460
    VERB     0.9657    0.9465    0.9560    810899
   CCONJ     0.9833    0.9920    0.9877    463354
    PRON     0.9657    0.9631    0.9644    405738
     ADP     0.9786    0.9886    0.9835    296947
     ADV     0.9300    0.9264    0.9282    285781
     ADJ     0.8347    0.8443    0.8395    273219
   PROPN     0.9428    0.9623    0.9525    128068
     NUM     0.9771    0.9913    0.9842     58389
     ORD     0.8362    0.9223    0.8771      8534
     ITJ     0.9088    0.8821    0.8953      4554
    PART     0.9509    0.9307    0.9407      3202
      FM     0.9226    0.8804    0.9010      2491

accuracy                         0.9588   4608800

macro avg 0.9386 0.9413 0.9397 4608800 weighted avg 0.9589 0.9588 0.9588 4608800

Confusion Matrix

Model Limitations

While the model achieves a high micro-F1 of 95.88%, users should be aware of the following:

Adjective/Noun Distinction: Most misclassifications occur between ADJ and NOUN due to the morphological overlap common in Latin.
Ordinal Numbers: The ORD tag (87.71% F1) is occasionally confused with standard adjectives.
Domain Specificity: The model is trained on legal and diplomatic corpora; performance may vary slightly on classical poetry or highly informal neo-Latin.

Usage

You can use this model directly with the Flair library.

from flair.models import SequenceTagger
from flair.data import Sentence

tagger = SequenceTagger.load("mschonhardt/latin-pos-tagger")

sentence = Sentence("In nomine sanctae et individuae trinitatis .")
tagger.predict(sentence)

for token in sentence:
    tag = token.get_tag("upos") 
    print(f"{token.text}\t{tag.value}\t{tag.score:.4f}")

Training Parameters

Learning Rate: 0.1
Mini Batch Size: 512
Max Epochs: 15
Optimizer: AnnealOnPlateau
Trained on a single GPU. Device: NVIDIA Blackwell 6000 Pro

Citation

If you use this model, please cite the specific model DOI and the Flair framework:

@software{schonhardt_michael_2026_latin_pos,
  author = "Schonhardt, Michael",
  title = "Latin POS Tagger (Flair)",
  year = "2026",
  publisher = "Zenodo",
  doi = "10.5281/zenodo.18631267",
  url = "https://huggingface.co/mschonhardt/latin-pos-tagger"
}

@inproceedings{akbik-etal-2018-contextual,
    title = "Contextual String Embeddings for Sequence Labeling",
    author = "Akbik, Alan and Blythe, Duncan and Vollgraf, Roland",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    year = "2018",
    pages = "1638--1649",
    publisher = "Association for Computational Linguistics"
}

Downloads last month: -