Update README.md
Browse files
README.md
CHANGED
|
@@ -13,4 +13,32 @@ It is the [ITALIAN-LEGAL-BERT](https://huggingface.co/dlicari/Italian-Legal-BERT
|
|
| 13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
| 14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
| 15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
| 16 |
-
and vocabulary size of 32000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
| 14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
| 15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
| 16 |
+
and vocabulary size of 32000
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
<h2> Usage </h2>
|
| 20 |
+
|
| 21 |
+
ITALIAN-LEGAL-BERT model can be loaded like:
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
from transformers import AutoModel, AutoTokenizer
|
| 25 |
+
model_name = "dlicari/Italian-Legal-BERT-SC"
|
| 26 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 27 |
+
model = AutoModel.from_pretrained(model_name)
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
You can use the Transformers library fill-mask pipeline to do inference with ITALIAN-LEGAL-BERT.
|
| 31 |
+
```python
|
| 32 |
+
# %pip install sentencepiece
|
| 33 |
+
# %pip install transformers
|
| 34 |
+
|
| 35 |
+
from transformers import pipeline
|
| 36 |
+
model_name = "dlicari/Italian-Legal-BERT-SC"
|
| 37 |
+
fill_mask = pipeline("fill-mask", model_name)
|
| 38 |
+
fill_mask("Il <mask> ha chiesto revocarsi l'obbligo di pagamento")
|
| 39 |
+
# [{'score': 0.6529251933097839,'token_str': 'ricorrente',
|
| 40 |
+
# {'score': 0.0380014143884182, 'token_str': 'convenuto',
|
| 41 |
+
# {'score': 0.0360226035118103, 'token_str': 'richiedente',
|
| 42 |
+
# {'score': 0.023908283561468124,'token_str': 'Condominio',
|
| 43 |
+
# {'score': 0.020863816142082214, 'token_str': 'lavoratore'}]
|
| 44 |
+
```
|