Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,37 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- he
|
| 5 |
+
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- bert
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
> Update 2023-5-23: This model is `BEREL` version 1.0. We are now happy to provide a much improved `BEREL_2.0`.
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
# Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language
|
| 14 |
+
|
| 15 |
+
When using BEREL, please reference:
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel, "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language", Aug 2022 [arXiv:2208.01875]
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
1. Usage:
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from transformers import AutoTokenizer, BertForMaskedLM
|
| 26 |
+
|
| 27 |
+
tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL')
|
| 28 |
+
model = BertForMaskedLM.from_pretrained('dicta-il/BEREL')
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
> NOTE: This code will **not** work and provide bad results if you use `BertTokenizer`. Please use `AutoTokenizer` or `BertTokenizerFast`.
|
| 32 |
+
|
| 33 |
+
2. Demo site:
|
| 34 |
+
You can experiment with the model in a GUI interface here: https://dicta-bert-demo.netlify.app/?genre=rabbinic
|
| 35 |
+
- The main part of the GUI consists of word buttons visualizing the tokenization of the sentences. Clicking on a button masks it, and then three BEREL word predictions are shown. Clicking on that bubble expands it to 10 predictions; alternatively, ctrl-clicking on that initial bubble expands to 30 predictions.
|
| 36 |
+
- Ctrl-clicking adjacent word buttons combines them into a single token for the mask.
|
| 37 |
+
- The edit box on top contains the input sentence; this can be modified at will, and the word-buttons will adjust as relevant.
|