Ericu950
/

SyllaMoBert-grc-macronizer-v1

Token Classification

Ancient Greek (to 1453)

Model card Files Files and versions

Ericu950 commited on Apr 30, 2025

Commit

2549e4e

·

verified ·

1 Parent(s): 2831234

Update README.md

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -7,9 +7,16 @@ language:
 # SyllaMoBert-grc-macronizer-v1
-**SyllaMoBert-grc-macronizer-v1** is a token classification model that performs macronization of Ancient Greek. It determines the syllabic length (long or short) of *dichrona*—open syllables that can be either long or short depending on context.
-This model builds upon [Albin Thörn Cleland’s rule-based macronizer](https://github.com/Urdatorn/macronize-tlg) and enhances it with machine learning. It uses a [ModernBERT](https://huggingface.co/docs/transformers/model_doc/modern_bert) architecture, fine-tuned on syllabified Ancient Greek texts using the base model [`Ericu950/SyllaMoBert-grc-v1`](https://huggingface.co/Ericu950/SyllaMoBert-grc-v1).
 ---
@@ -19,9 +26,10 @@ First, install the syllabification utility:
 ```bash
 pip install syllagreek_utils==0.1.0
 Then run the following code:
 import torch
 from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
 from syllagreek_utils import preprocess_greek_line, syllabify_joined

 # SyllaMoBert-grc-macronizer-v1
+**SyllaMoBERT-grc-macronizer-v1** is a token classification model designed for the macronization of Ancient Greek. It predicts the syllabic quantity—long or short—of dichrona, which are open syllables whose length depends on morphological or phonological context.
+The model was evaluated using an 80/10/10 train/dev/test split and achieved the following accuracy:
+	•	97.9% on open syllables with short dichrona
+	•	99.0% on open syllables with long dichrona
+	•	99.8% on the (trivially predictable) class of heavy syllables
+This makes SyllaMoBert-grc-macronizer-v1 a useful tool for tasks involving prosody, metrical analysis.
+This model is trained on data generated by [Albin Thörn Cleland’s rule-based macronizer](https://github.com/Urdatorn/macronize-tlg). It is a finetuned version of a [ModernBERT](https://huggingface.co/docs/transformers/model_doc/modern_bert) model trained from skratch on syllabified Ancient Greek texts using the base model [`Ericu950/SyllaMoBert-grc-v1`](https://huggingface.co/Ericu950/SyllaMoBert-grc-v1).
 ---
 ```bash
 pip install syllagreek_utils==0.1.0
+```
 Then run the following code:
+```
 import torch
 from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
 from syllagreek_utils import preprocess_greek_line, syllabify_joined