DBBErt: A BERT-based Language Model for Byzantine Greek

Paper DOI


πŸ“– Model Description

DBBErt is a transformer-based language model fine-tuned for Byzantine Greek, trained on data from the Database of Byzantine Book Epigrams (DBBE).
It supports tasks such as:

  • Part-of-speech tagging
  • Morphological analysis
  • Lemmatization

The model is designed to process both Greek texts from critical editions and unedited medieval Greek texts, which are characterised by:

  • Non-standard orthography
  • Dialectal and diachronic variation
  • Manuscript-based transcription conventions

πŸ› οΈ How to Use

Example with πŸ€— Transformers:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("coswaele/DBBErt")
model = AutoModelForTokenClassification.from_pretrained("coswaele/DBBErt")

text = "ἐν τοῖς βιβλίοις"
tokens = tokenizer(text, return_tensors="pt")
outputs = model(**tokens)

πŸ“œ Citation

If you use DBBErt in your research, please cite:

@article{swaelens_lre,
  author    = {Swaelens, Colin and De Vos, Ilse and Lefever, Els},
  title     = {Linguistic annotation of Byzantine book epigrams},
  journal   = {Language Resources and Evaluation},
  year      = {2025},
  volume    = {59},
  number    = {1},
  pages     = {109--134},
  doi       = {10.1007/s10579-023-09703-x},
  url       = {https://doi.org/10.1007/s10579-023-09703-x}
}
Downloads last month
83
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support