Opla - Greek POS Tagger and Dependency Parser
GPU-optimized Greek POS tagger and dependency parser. 215x faster than gr-nlp-toolkit on real-world Greek text, with identical POS output and near-identical dependency parsing.
Supports Modern Greek (el), Ancient Greek (grc), and Medieval Greek (med).
Source code: github.com/ciscoriordan/opla
Weights
| File | Language | Size | Description |
|---|---|---|---|
weights/grc/opla_grc.pt |
Ancient Greek | 632 MB | PyTorch checkpoint (joint POS+DP on Ancient-Greek-BERT) |
weights/grc/onnx/opla_joint.onnx |
Ancient Greek | 535 MB | ONNX model for CPU deployment (with .data and meta.json) |
weights/med/opla_med.pt |
Medieval Greek | ~632 MB | PyTorch checkpoint (joint POS+DP on Ancient-Greek-BERT) |
Modern Greek weights are loaded directly from AUEB-NLP/gr-nlp-toolkit.
Ancient Greek accuracy
Dev set accuracy on combined Perseus + PROIEL + Gorman treebanks (1.1M tokens):
| Metric | Accuracy |
|---|---|
| UPOS | 96.8% |
| DEPREL | 91.8% |
Training data:
- UD_Ancient_Greek-Perseus (203K tokens)
- UD_Ancient_Greek-PROIEL (214K tokens)
- Gorman Greek Dependency Trees (692K tokens)
- DiGreC (103K tokens, fine-tuning)
Usage
Architecture
The grc and med models use a single
Ancient-Greek-BERT
backbone with jointly trained POS and DP heads, requiring only one BERT
forward pass per batch. The el model uses dual
GreekBERT
backbones (inherited from gr-nlp-toolkit).
ONNX inference
The ONNX model provides CPU-only deployment without requiring PyTorch.
Install onnxruntime and pass checkpoint="onnx" to Opla().
Citation
License
MIT