DBBErt: A BERT-based Language Model for Byzantine Greek
π Model Description
DBBErt is a transformer-based language model fine-tuned for Byzantine Greek, trained on data from the Database of Byzantine Book Epigrams (DBBE).
It supports tasks such as:
- Part-of-speech tagging
- Morphological analysis
- Lemmatization
The model is designed to process both Greek texts from critical editions and unedited medieval Greek texts, which are characterised by:
- Non-standard orthography
- Dialectal and diachronic variation
- Manuscript-based transcription conventions
π οΈ How to Use
Example with π€ Transformers:
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("coswaele/DBBErt")
model = AutoModelForTokenClassification.from_pretrained("coswaele/DBBErt")
text = "αΌΞ½ ΟΞΏαΏΟ βιβλίοιΟ"
tokens = tokenizer(text, return_tensors="pt")
outputs = model(**tokens)
π Citation
If you use DBBErt in your research, please cite:
@article{swaelens_lre,
author = {Swaelens, Colin and De Vos, Ilse and Lefever, Els},
title = {Linguistic annotation of Byzantine book epigrams},
journal = {Language Resources and Evaluation},
year = {2025},
volume = {59},
number = {1},
pages = {109--134},
doi = {10.1007/s10579-023-09703-x},
url = {https://doi.org/10.1007/s10579-023-09703-x}
}
- Downloads last month
- 83