princeton-logion
/

logion-bert-base

Model card Files Files and versions

jbmurel commited on Dec 17, 2024

Commit

242a70d

·

verified ·

1 Parent(s): bd6cce4

Update ReadME

Files changed (1) hide show

README.md +15 -15

README.md CHANGED Viewed

@@ -1,12 +1,9 @@
 # Logion: Machine Learning for Greek Philology
-The most advanced Ancient Greek BERT model trained to date! Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
-We train a WordPiece tokenizer (with a vocab size of 50,000) on a corpus of over 70 million words of premodern Greek. Using this tokenizer and the same corpus, we train a BERT model.
-Further information on this project and code for error detection can be found on [GitHub](https://github.com/charliecb/Logion).
-We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
 ## How to use
@@ -21,8 +18,8 @@ Load the model and tokenizer directly from the HuggingFace Model Hub:
 ```python
 from transformers import BertTokenizer, BertForMaskedLM
-tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-50k_wordpiece")
-model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
 ```
@@ -31,12 +28,15 @@ model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
 If you use this model in your research, please cite the paper:
 ```
-@misc{logion-base,
-      title={Logion: Machine Learning for Greek Philology},
-      author={Cowen-Breen, C. and Brooks, C. and Haubold, J. and Graziosi, B.},
-      year={2023},
-      eprint={2305.01099},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
 }
 ```

 # Logion: Machine Learning for Greek Philology
+BERT model trained on largest set of Ancient Greek texts to-date.
+Read the ALP paper on [here](https://aclanthology.org/2023.alp-1.20/).
+Trained using WordPiece tokenizer (vocab size of 50,000) on a corpus of 70+ million words in pre-modern Greek.
 ## How to use
 ```python
 from transformers import BertTokenizer, BertForMaskedLM
+tokenizer = BertTokenizer.from_pretrained("princeton-logion/LOGION-50k_wordpiece")
+model = BertForMaskedLM.from_pretrained("princeton-logion/LOGION-50k_wordpiece")
 ```
 If you use this model in your research, please cite the paper:
 ```
+@inproceedings{cowen-breen-etal-2023-logion,
+    title = "Logion: Machine-Learning Based Detection and Correction of Textual Errors in {G}reek Philology",
+    author = "Cowen-Breen, Charlie  and
+      Brooks, Creston  and
+      Graziosi, Barbara  and
+      Haubold, Johannes",
+    booktitle = "Proceedings of the Ancient Language Processing Workshop",
+    year = "2023",
+    url = "https://aclanthology.org/2023.alp-1.20",
+    pages = "170--178",
 }
 ```