Ericu950 commited on
Commit
2549e4e
·
verified ·
1 Parent(s): 2831234

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -7,9 +7,16 @@ language:
7
 
8
  # SyllaMoBert-grc-macronizer-v1
9
 
10
- **SyllaMoBert-grc-macronizer-v1** is a token classification model that performs macronization of Ancient Greek. It determines the syllabic length (long or short) of *dichrona*—open syllables that can be either long or short depending on context.
11
 
12
- This model builds upon [Albin Thörn Cleland’s rule-based macronizer](https://github.com/Urdatorn/macronize-tlg) and enhances it with machine learning. It uses a [ModernBERT](https://huggingface.co/docs/transformers/model_doc/modern_bert) architecture, fine-tuned on syllabified Ancient Greek texts using the base model [`Ericu950/SyllaMoBert-grc-v1`](https://huggingface.co/Ericu950/SyllaMoBert-grc-v1).
 
 
 
 
 
 
 
13
 
14
  ---
15
 
@@ -19,9 +26,10 @@ First, install the syllabification utility:
19
 
20
  ```bash
21
  pip install syllagreek_utils==0.1.0
 
22
 
23
  Then run the following code:
24
-
25
  import torch
26
  from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
27
  from syllagreek_utils import preprocess_greek_line, syllabify_joined
 
7
 
8
  # SyllaMoBert-grc-macronizer-v1
9
 
10
+ **SyllaMoBERT-grc-macronizer-v1** is a token classification model designed for the macronization of Ancient Greek. It predicts the syllabic quantity—long or shortof dichrona, which are open syllables whose length depends on morphological or phonological context.
11
 
12
+ The model was evaluated using an 80/10/10 train/dev/test split and achieved the following accuracy:
13
+ • 97.9% on open syllables with short dichrona
14
+ • 99.0% on open syllables with long dichrona
15
+ • 99.8% on the (trivially predictable) class of heavy syllables
16
+
17
+ This makes SyllaMoBert-grc-macronizer-v1 a useful tool for tasks involving prosody, metrical analysis.
18
+
19
+ This model is trained on data generated by [Albin Thörn Cleland’s rule-based macronizer](https://github.com/Urdatorn/macronize-tlg). It is a finetuned version of a [ModernBERT](https://huggingface.co/docs/transformers/model_doc/modern_bert) model trained from skratch on syllabified Ancient Greek texts using the base model [`Ericu950/SyllaMoBert-grc-v1`](https://huggingface.co/Ericu950/SyllaMoBert-grc-v1).
20
 
21
  ---
22
 
 
26
 
27
  ```bash
28
  pip install syllagreek_utils==0.1.0
29
+ ```
30
 
31
  Then run the following code:
32
+ ```
33
  import torch
34
  from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
35
  from syllagreek_utils import preprocess_greek_line, syllabify_joined