LatinCy Stanza (la_stanza_latincy)

A Stanza (Stanford NLP) model suite for Latin trained on harmonized Universal Dependencies treebanks from LatinCy. Provides tokenization, POS tagging, morphological features, lemmatization, dependency parsing, and named entity recognition.

Highlights

Full NLP pipeline -- tokenizer, POS/morph tagger, lemmatizer, dependency parser, NER
6 UD treebanks + LASLA: POS/morph/lemma trained on ~2.87M tokens (UD+LASLA combined)
Custom character language models trained on 1.6 GB of curated Latin text (13.7M sentences)
Custom word vectors (CBOW-300, trained on curated Latin corpus)
NER with 3 entity types: PERSON, LOC, NORP

Quick Start

import stanza
from huggingface_hub import snapshot_download

# Download models (one time)
model_dir = snapshot_download("latincy/la_stanza_latincy")

# Load pipeline
nlp = stanza.Pipeline("la", dir=model_dir, download_method=None)

# Annotate
doc = nlp("Gallia est omnis divisa in partes tres.")
for sent in doc.sentences:
    for word in sent.words:
        print(f"{word.text:12s} {word.upos:6s} {word.lemma:12s} {word.deprel}")

Output:

Gallia       PROPN  Gallia       nsubj:pass
est          AUX    sum          aux:pass
omnis        DET    omnis        det
divisa       VERB   divido       root
in           ADP    in           case
partes       NOUN   pars         obl
tres         NUM    tres         nummod
.            PUNCT  .            punct

NER

nlp = stanza.Pipeline("la", dir=model_dir, download_method=None,
                       processors="tokenize,ner")
doc = nlp("Caesar in Galliam cum legionibus contendit.")
for ent in doc.ents:
    print(f"{ent.text:20s} {ent.type}")

Loading from a Local Directory

If you have the models locally (e.g., after cloning the HuggingFace repo):

nlp = stanza.Pipeline("la", dir="/path/to/la_stanza_latincy",
                       download_method=None)

Model Description

Property	Value
Author	Patrick J. Burns / LatinCy
Model type	Stanza neural pipeline (BiLSTM-CRF, biaffine parser)
Language	Latin
License	MIT
Total size	~1.1 GB (8 model files)
Framework	Stanza (Stanford NLP)

Pipeline Components

Component	Model File	Architecture
Tokenizer	`tokenize/latincy.pt` (11 MB)	BiLSTM segmenter
POS/Morph	`pos/latincy.pt` (143 MB)	BiLSTM tagger with CharLM + pretrained vectors
Lemmatizer	`lemma/latincy.pt` (46 MB)	Seq2seq with edit classifier
Dep. Parser	`depparse/latincy.pt` (170 MB)	Deep biaffine attention parser
NER	`ner/latincy.pt` (151 MB)	BiLSTM-CRF with CharLM + pretrained vectors
CharLM (fwd)	`forward_charlm/latincy.pt` (197 MB)	Character-level LSTM language model
CharLM (bwd)	`backward_charlm/latincy.pt` (197 MB)	Character-level LSTM language model
Pretrain	`pretrain/latincy.pt` (174 MB)	Word2Vec CBOW-300 embeddings

Training Data

POS, Morphology, Lemmatization (UD + LASLA)

Trained on harmonized data from 6 Universal Dependencies Latin treebanks combined with the LASLA corpus (~1.84M tokens of classical Latin with POS, morphological features, and lemmas).

Treebank	Full Name	Domain
ITTB	Index Thomisticus Treebank	Scholastic Latin (Thomas Aquinas)
LLCT	Late Latin Charter Treebank	Medieval legal charters
PROIEL	PROIEL Treebank	Vulgate Bible, historical texts
Perseus	Perseus Latin Treebank	Classical Latin (Caesar, Cicero, etc.)
UDante	UDante Treebank	Dante Alighieri (De vulgari eloquentia, etc.)
CIRCSE	CIRCSE Latin Treebank	LASLA-derived classical texts
LASLA	LASLA corpus	Classical Latin (morphology only, no deps)

Combined: ~2.87M tokens for POS/morph/lemma; ~1.03M tokens (UD only) for tokenizer and dependency parsing.

NER

Trained on LatinCy NER annotations from 4 sources: 13,493 train / 3,195 dev sentences. Entity types: PERSON (79%), LOC (14%), NORP (7%).

Character Language Models

Trained on 1.6 GB of curated Latin text (13.7M sentences from 9 sources) for 15 epochs. Forward and backward CharLMs provide contextualized character-level features to the POS tagger, lemmatizer, parser, and NER.

Training Procedure

Tokenizer: BiLSTM segmenter trained on UD-only data.

POS/Morph tagger: BiLSTM with CharLM features and pretrained word vectors, trained on UD+LASLA combined data.

Lemmatizer: Seq2seq model with edit classifier, CharLM features, trained on UD+LASLA combined data.

Dependency parser: Deep biaffine attention parser with CharLM features and pretrained word vectors, trained on UD-only data.

NER tagger: BiLSTM-CRF with CharLM features and pretrained word vectors, 8,500 training steps with early stopping.

Evaluation Results

Overall Scores

Component	Metric	v0.2 (CharLM)	v0.3 (Latin BERT)	Best	Split
Tokenizer	Token F1	98.24	—	v0.2	dev
Tokenizer	Sentence F1	86.59	—	v0.2	dev
POS	UPOS	97.26	97.65	v0.3	test
POS	XPOS	—	97.38	v0.3	test
POS	UFeats	92.80	93.93	v0.3	test
POS	AllTags	—	92.51	v0.3	test
Lemma	Accuracy	97.87	—	v0.2	test
Dep. Parse	UAS	86.95	86.20	v0.2	test
Dep. Parse	LAS	83.23	81.98	v0.2	test
Dep. Parse	MLAS	76.96	75.23	v0.2	test
Dep. Parse	BLEX	79.46	78.00	v0.2	test
NER	Entity F1	90.22	90.17	v0.2	dev
NER	PERSON F1	93.01	93.41	v0.3	dev
NER	LOC F1	80.88	79.47	v0.2	dev
NER	NORP F1	78.44	76.00	v0.2	dev

v0.3 trained a Latin BERT (Bamman & Burns 2020) transformer backend for POS and it improved all POS metrics. Depparse and NER perform best with CharLM alone.

v0.3.1 retracts the Latin BERT POS model. Latin BERT ships with a custom fast tokenizer (tokenization_latin_bert_fast.py) that requires trust_remote_code=True. Stanza's bert_embedding.load_tokenizer does not pass that flag, so the BERT POS checkpoint fails to load end-to-end from the published HF repo. v0.3.1 reverts POS to the CharLM backend (numbers match the v0.2 column above). All other components are unchanged from v0.3. A transformer POS will return once the Stanza/Latin BERT integration is resolved.

Cross-Framework Comparison

All models trained on the same harmonized treebank data. Scores on held-out test sets unless noted. NER scores are on dev (no test set exists).

Metric	LatinCy Stanza 0.3.1	LatinCy Flair 0.3	LatinCy UDPipe 0.2	LatinCy spaCy trf 3.9
UPOS	97.26	98.02	94.07	97.34
UFeats	92.80	--	80.82	93.95
Lemma	97.87	97.41	92.99	94.63
UAS	86.95	--	76.48	86.91
LAS	83.23	--	71.57	82.04
NER F1	90.22	92.22	--	91.14

Stanza leads on lemma, UAS, and LAS. Flair 0.3 (Latin BERT) leads on UPOS, UFeats, and NER. spaCy trf is competitive across all metrics. UDPipe offers single-file portability usable from R, Python, CLI, and other platforms.

vs. Stanford's Official Latin Package (`stanfordnlp/stanza-la`)

Stanford distributes separate per-treebank models (ITTB, LLCT, Perseus, PROIEL, UDante) without character language models (nocharlm variants) and without NER. LatinCy Stanza trains a single unified model across all treebanks plus LASLA, with custom forward/backward CharLMs and pretrained word vectors. A direct benchmark comparison is planned for a future release.

Limitations

No test split for NER: NER scores are on the dev set; no held-out test evaluation is available.
Tokenizer scores on dev: No separate test evaluation was run for the tokenizer.
LASLA data is morphology-only: Dependency parsing trained on UD data only (~1.03M tokens), not the full 2.87M token corpus.
No transformer features: All components use BiLSTM + CharLM. A Latin BERT POS variant was trained for v0.3 but retracted in v0.3.1 due to a Stanza/Latin-BERT tokenizer loading incompatibility (see Evaluation Results).
Large total size: The full model suite is ~1.1 GB due to 8 separate model files (including 2 CharLMs at 197 MB each). Individual components can be loaded selectively.

Future Development

The following Stanza processors are not yet implemented for Latin in this release but will be considered for future development:

Constituency parsing (phrase structure)
Coreference resolution
Sentiment analysis
Multi-word token (MWT) expansion

Also, we expect to train the next version of LatinCy Stanza using a transformer model for improved accuracy on morphological features and dependency parsing.

Version History

Version	Date	Treebank Data	Changes
0.3.1	2026-04	LatinCy v3.9	Revert POS to the v0.2 CharLM checkpoint. The v0.3 Latin BERT POS model is incompatible with Stanza's BERT tokenizer loader (custom Latin BERT tokenizer requires `trust_remote_code=True`). All other components unchanged.
0.3	2026-03	LatinCy v3.9	Latin BERT transformer backend for POS (UPOS +0.39, UFeats +1.13). Best-of per component: Latin BERT POS, CharLM for all others. Retracted in 0.3.1.
0.2	2026-03	LatinCy v3.9	Retrained POS, lemma, depparse on harmonized treebanks with Gender feature fix. UFeats +0.60, UAS +0.22.
0.1	2026-02	LatinCy v3.8	Initial release. All components (tokenizer, POS, lemma, depparse, NER, CharLM).

References

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. 2020. "Stanza: A Python Natural Language Processing Toolkit for Many Human Languages." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. https://nlp.stanford.edu/pubs/qi2020stanza.pdf.

Citation

@misc{burns2026latincystanza,
  author = {Burns, Patrick J.},
  title = {{LatinCy Stanza (la\_stanza\_latincy)}},
  year = {2026},
  url = {https://huggingface.co/latincy/la_stanza_latincy},
}

Acknowledgments

This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Token F1 on UD Latin (combined)
self-reported

98.240
Sentence F1 on UD Latin (combined)
self-reported

86.590
UPOS on UD Latin (combined + LASLA)
self-reported

97.260
UFeats on UD Latin (combined + LASLA)
self-reported

92.800
Lemma Accuracy on UD Latin (combined + LASLA)
self-reported

97.870
UAS on UD Latin (combined)
self-reported

86.950
LAS on UD Latin (combined)
self-reported

83.230
Entity F1 on LatinCy NER (4 sources)
self-reported

90.220