jbmurel commited on
Commit
242a70d
·
verified ·
1 Parent(s): bd6cce4

Update ReadME

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -1,12 +1,9 @@
1
  # Logion: Machine Learning for Greek Philology
2
 
3
- The most advanced Ancient Greek BERT model trained to date! Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
 
4
 
5
- We train a WordPiece tokenizer (with a vocab size of 50,000) on a corpus of over 70 million words of premodern Greek. Using this tokenizer and the same corpus, we train a BERT model.
6
-
7
- Further information on this project and code for error detection can be found on [GitHub](https://github.com/charliecb/Logion).
8
-
9
- We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
10
 
11
  ## How to use
12
 
@@ -21,8 +18,8 @@ Load the model and tokenizer directly from the HuggingFace Model Hub:
21
 
22
  ```python
23
  from transformers import BertTokenizer, BertForMaskedLM
24
- tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-50k_wordpiece")
25
- model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
26
  ```
27
 
28
 
@@ -31,12 +28,15 @@ model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
31
  If you use this model in your research, please cite the paper:
32
 
33
  ```
34
- @misc{logion-base,
35
- title={Logion: Machine Learning for Greek Philology},
36
- author={Cowen-Breen, C. and Brooks, C. and Haubold, J. and Graziosi, B.},
37
- year={2023},
38
- eprint={2305.01099},
39
- archivePrefix={arXiv},
40
- primaryClass={cs.CL}
 
 
 
41
  }
42
  ```
 
1
  # Logion: Machine Learning for Greek Philology
2
 
3
+ BERT model trained on largest set of Ancient Greek texts to-date.
4
+ Read the ALP paper on [here](https://aclanthology.org/2023.alp-1.20/).
5
 
6
+ Trained using WordPiece tokenizer (vocab size of 50,000) on a corpus of 70+ million words in pre-modern Greek.
 
 
 
 
7
 
8
  ## How to use
9
 
 
18
 
19
  ```python
20
  from transformers import BertTokenizer, BertForMaskedLM
21
+ tokenizer = BertTokenizer.from_pretrained("princeton-logion/LOGION-50k_wordpiece")
22
+ model = BertForMaskedLM.from_pretrained("princeton-logion/LOGION-50k_wordpiece")
23
  ```
24
 
25
 
 
28
  If you use this model in your research, please cite the paper:
29
 
30
  ```
31
+ @inproceedings{cowen-breen-etal-2023-logion,
32
+ title = "Logion: Machine-Learning Based Detection and Correction of Textual Errors in {G}reek Philology",
33
+ author = "Cowen-Breen, Charlie and
34
+ Brooks, Creston and
35
+ Graziosi, Barbara and
36
+ Haubold, Johannes",
37
+ booktitle = "Proceedings of the Ancient Language Processing Workshop",
38
+ year = "2023",
39
+ url = "https://aclanthology.org/2023.alp-1.20",
40
+ pages = "170--178",
41
  }
42
  ```