A simple code block that shows the usage of the model was added.
Browse files
README.md
CHANGED
|
@@ -13,6 +13,21 @@ We introduce BERTurk-Legal which is a transformer-based language model to retrie
|
|
| 13 |
|
| 14 |
Test dataset can be accessed from the following link: https://github.com/koc-lab/yargitay_retrieval_dataset
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
## Citation
|
| 17 |
If you use the model, please cite the following conference paper.
|
| 18 |
```
|
|
|
|
| 13 |
|
| 14 |
Test dataset can be accessed from the following link: https://github.com/koc-lab/yargitay_retrieval_dataset
|
| 15 |
|
| 16 |
+
The model can be loaded and used to create document embeddings as follows. Then, the document embeddings can be utilized for retrieval.
|
| 17 |
+
```
|
| 18 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 19 |
+
|
| 20 |
+
bert_model = "KocLab-Bilkent/BERTurk-Legal"
|
| 21 |
+
|
| 22 |
+
model = AutoModelForSequenceClassification.from_pretrained(bert_model, output_hidden_states=True)
|
| 23 |
+
tokenizer = AutoTokenizer.from_pretrained(bert_model)
|
| 24 |
+
|
| 25 |
+
tokens = tokenizer("Örnek metin") # a dummy text is provided as input
|
| 26 |
+
|
| 27 |
+
output = model(tokens)
|
| 28 |
+
docEmbeddings = output.hidden_states[-1]
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
## Citation
|
| 32 |
If you use the model, please cite the following conference paper.
|
| 33 |
```
|