DBBErt: A BERT-based Language Model for Byzantine Greek

📖 Model Description

DBBErt is a transformer-based language model fine-tuned for Byzantine Greek, trained on data from the Database of Byzantine Book Epigrams (DBBE).
It supports tasks such as:

Part-of-speech tagging
Morphological analysis
Lemmatization

The model is designed to process both Greek texts from critical editions and unedited medieval Greek texts, which are characterised by:

Non-standard orthography
Dialectal and diachronic variation
Manuscript-based transcription conventions

🛠️ How to Use

Example with 🤗 Transformers:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("coswaele/DBBErt")
model = AutoModelForTokenClassification.from_pretrained("coswaele/DBBErt")

text = "ἐν τοῖς βιβλίοις"
tokens = tokenizer(text, return_tensors="pt")
outputs = model(**tokens)

📜 Citation

If you use DBBErt in your research, please cite:

@article{swaelens_lre,
  author    = {Swaelens, Colin and De Vos, Ilse and Lefever, Els},
  title     = {Linguistic annotation of Byzantine book epigrams},
  journal   = {Language Resources and Evaluation},
  year      = {2025},
  volume    = {59},
  number    = {1},
  pages     = {109--134},
  doi       = {10.1007/s10579-023-09703-x},
  url       = {https://doi.org/10.1007/s10579-023-09703-x}
}

Downloads last month: 44