|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- el |
|
|
pipeline_tag: fill-mask |
|
|
--- |
|
|
# Logion base model |
|
|
|
|
|
BERT-based model pretrained on largest set of pre-modern Greek to-date. It was introduced in this [paper](https://aclanthology.org/2023.alp-1.20/). |
|
|
|
|
|
The model uses a WordPiece tokenizer (vocab size of 50,000) on a corpus of over 70 million words (over 95 million tokens) of premodern Greek. This model ignores cases and accents/diacritics. |
|
|
|
|
|
## How to use |
|
|
|
|
|
Requirements: |
|
|
|
|
|
```python |
|
|
pip install transformers |
|
|
``` |
|
|
|
|
|
Load the model and tokenizer directly from the HuggingFace Model Hub: |
|
|
|
|
|
|
|
|
```python |
|
|
from transformers import BertTokenizer, BertForMaskedLM |
|
|
tokenizer = BertTokenizer.from_pretrained("princeton-logion/logion-bert-base") |
|
|
model = BertForMaskedLM.from_pretrained("princeton-logion/logion-bert-base") |
|
|
``` |
|
|
|
|
|
|
|
|
## Cite |
|
|
|
|
|
If you use this model in your research, please cite the paper: |
|
|
|
|
|
``` |
|
|
@inproceedings{cowen-breen-etal-2023-logion, |
|
|
title = "Logion: Machine-Learning Based Detection and Correction of Textual Errors in {G}reek Philology", |
|
|
author = "Cowen-Breen, Charlie and |
|
|
Brooks, Creston and |
|
|
Graziosi, Barbara and |
|
|
Haubold, Johannes", |
|
|
booktitle = "Proceedings of the Ancient Language Processing Workshop", |
|
|
year = "2023", |
|
|
url = "https://aclanthology.org/2023.alp-1.20", |
|
|
pages = "170--178", |
|
|
} |
|
|
``` |