---
language: la
library_name: flair
license: cc-by-sa-4.0
tags:
- flair
- token-classification
- sequence-tagger
- latin
- medieval-latin
- legal-history
- pos-tagging
widget:
- text: "In nomine sanctae et individuae trinitatis ."
---

# Latin Contextual POS Tagger (Flair)

This model is a Part-of-Speech (POS) tagger for Latin, specifically optimized for medieval and early modern legal texts. It uses a Bi-LSTM-CRF architecture based on domain-specific contextual string embeddings.

The model was developed as part of the projects **"Embedding the Past"** (LOEWE-Exploration, TU Darmstadt) and **"Burchards Dekret Digital"** (Langzeitvorhaben, Akademie der Wissenschaften und der Literatur | Mainz).

## Technical Details

- **Architecture:** Bi-LSTM + CRF Sequence Tagger.
- **Hidden Size:** 1024 (2 layers).
- **Base Embeddings:** Stacked [Latin Legal Forward](https://huggingface.co/mschonhardt/latin-legal-forward) and [Backward](https://huggingface.co/mschonhardt/latin-legal-backward) contextual string embeddings.
- **Data Source:** Corpus of ~1.59M training sentences from medieval texts. 
- **Accuracy:** 95.88% (Micro F1-score / Accuracy).

## Data Source and Acknowledgements
We gratefully acknowledge that the training data originates from the **[Latin Text Archive (LTA)](http://lta.bbaw.de)** (**Prof. Dr. Bernhard Jussen**, **Dr. Tim Geelhaar**) including data from  Monumenta Germaniae Historica, Corpus Corporum and IRHT. 


## Performance Metrics

Results:
- F-score (micro) 0.9588
- F-score (macro) 0.9397
- Accuracy 0.9588

By class:
              precision    recall  f1-score   support

        NOUN     0.9444    0.9480    0.9462   1036164
       PUNCT     0.9999    1.0000    1.0000    831460
        VERB     0.9657    0.9465    0.9560    810899
       CCONJ     0.9833    0.9920    0.9877    463354
        PRON     0.9657    0.9631    0.9644    405738
         ADP     0.9786    0.9886    0.9835    296947
         ADV     0.9300    0.9264    0.9282    285781
         ADJ     0.8347    0.8443    0.8395    273219
       PROPN     0.9428    0.9623    0.9525    128068
         NUM     0.9771    0.9913    0.9842     58389
         ORD     0.8362    0.9223    0.8771      8534
         ITJ     0.9088    0.8821    0.8953      4554
        PART     0.9509    0.9307    0.9407      3202
          FM     0.9226    0.8804    0.9010      2491

    accuracy                         0.9588   4608800
   macro avg     0.9386    0.9413    0.9397   4608800
weighted avg     0.9589    0.9588    0.9588   4608800

### Confusion Matrix
![Confusion Matrix](confusion_matrix.png)

### Model Limitations

While the model achieves a high micro-F1 of 95.88%, users should be aware of the following:

* **Adjective/Noun Distinction:** Most misclassifications occur between `ADJ` and `NOUN` due to the morphological overlap common in Latin.
* **Ordinal Numbers:** The `ORD` tag (87.71% F1) is occasionally confused with standard adjectives.
* **Domain Specificity:** The model is trained on legal and diplomatic corpora; performance may vary slightly on classical poetry or highly informal neo-Latin.

## Usage

You can use this model directly with the [Flair](https://github.com/flairNLP/flair) library.

```python
from flair.models import SequenceTagger
from flair.data import Sentence

tagger = SequenceTagger.load("mschonhardt/latin-pos-tagger")

sentence = Sentence("In nomine sanctae et individuae trinitatis .")
tagger.predict(sentence)

for token in sentence:
    tag = token.get_tag("upos") 
    print(f"{token.text}\t{tag.value}\t{tag.score:.4f}")

```

## Training Parameters
* Learning Rate: 0.1
* Mini Batch Size: 512
* Max Epochs: 15
* Optimizer: AnnealOnPlateau
* Trained on a single GPU. Device: NVIDIA Blackwell 6000 Pro

## Citation

If you use this model, please cite the specific model DOI and the Flair framework:

```bibtex
@software{schonhardt_michael_2026_latin_pos,
  author = "Schonhardt, Michael",
  title = "Latin POS Tagger (Flair)",
  year = "2026",
  publisher = "Zenodo",
  doi = "10.5281/zenodo.18631267",
  url = "https://huggingface.co/mschonhardt/latin-pos-tagger"
}
```

```bibtex
@inproceedings{akbik-etal-2018-contextual,
    title = "Contextual String Embeddings for Sequence Labeling",
    author = "Akbik, Alan and Blythe, Duncan and Vollgraf, Roland",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    year = "2018",
    pages = "1638--1649",
    publisher = "Association for Computational Linguistics"
}
```