Update README.md
Browse files
README.md
CHANGED
|
@@ -7,9 +7,16 @@ language:
|
|
| 7 |
|
| 8 |
# SyllaMoBert-grc-macronizer-v1
|
| 9 |
|
| 10 |
-
**
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
---
|
| 15 |
|
|
@@ -19,9 +26,10 @@ First, install the syllabification utility:
|
|
| 19 |
|
| 20 |
```bash
|
| 21 |
pip install syllagreek_utils==0.1.0
|
|
|
|
| 22 |
|
| 23 |
Then run the following code:
|
| 24 |
-
|
| 25 |
import torch
|
| 26 |
from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
|
| 27 |
from syllagreek_utils import preprocess_greek_line, syllabify_joined
|
|
|
|
| 7 |
|
| 8 |
# SyllaMoBert-grc-macronizer-v1
|
| 9 |
|
| 10 |
+
**SyllaMoBERT-grc-macronizer-v1** is a token classification model designed for the macronization of Ancient Greek. It predicts the syllabic quantity—long or short—of dichrona, which are open syllables whose length depends on morphological or phonological context.
|
| 11 |
|
| 12 |
+
The model was evaluated using an 80/10/10 train/dev/test split and achieved the following accuracy:
|
| 13 |
+
• 97.9% on open syllables with short dichrona
|
| 14 |
+
• 99.0% on open syllables with long dichrona
|
| 15 |
+
• 99.8% on the (trivially predictable) class of heavy syllables
|
| 16 |
+
|
| 17 |
+
This makes SyllaMoBert-grc-macronizer-v1 a useful tool for tasks involving prosody, metrical analysis.
|
| 18 |
+
|
| 19 |
+
This model is trained on data generated by [Albin Thörn Cleland’s rule-based macronizer](https://github.com/Urdatorn/macronize-tlg). It is a finetuned version of a [ModernBERT](https://huggingface.co/docs/transformers/model_doc/modern_bert) model trained from skratch on syllabified Ancient Greek texts using the base model [`Ericu950/SyllaMoBert-grc-v1`](https://huggingface.co/Ericu950/SyllaMoBert-grc-v1).
|
| 20 |
|
| 21 |
---
|
| 22 |
|
|
|
|
| 26 |
|
| 27 |
```bash
|
| 28 |
pip install syllagreek_utils==0.1.0
|
| 29 |
+
```
|
| 30 |
|
| 31 |
Then run the following code:
|
| 32 |
+
```
|
| 33 |
import torch
|
| 34 |
from transformers import PreTrainedTokenizerFast, ModernBertForTokenClassification
|
| 35 |
from syllagreek_utils import preprocess_greek_line, syllabify_joined
|