Commit
·
081e10c
1
Parent(s):
19ba203
Update README.md
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ widget:
|
|
| 53 |
|
| 54 |
## Model description
|
| 55 |
|
| 56 |
-
|
| 57 |
It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
|
| 58 |
and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
|
| 59 |
|
|
@@ -74,10 +74,11 @@ tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
|
| 74 |
model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
| 75 |
model.eval()
|
| 76 |
pipeline = FillMaskPipeline(model, tokenizer_hf)
|
| 77 |
-
text = f"Em dic <mask>."
|
| 78 |
res_hf = pipeline(text)
|
| 79 |
pprint([r['token_str'] for r in res_hf])
|
| 80 |
```
|
|
|
|
| 81 |
## Training
|
| 82 |
|
| 83 |
### Training data
|
|
|
|
| 53 |
|
| 54 |
## Model description
|
| 55 |
|
| 56 |
+
The **roberta-base-ca-v2** is a transformer-based masked language model for the Catalan language.
|
| 57 |
It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
|
| 58 |
and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
|
| 59 |
|
|
|
|
| 74 |
model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
|
| 75 |
model.eval()
|
| 76 |
pipeline = FillMaskPipeline(model, tokenizer_hf)
|
| 77 |
+
text = f"Em dic <mask>."
|
| 78 |
res_hf = pipeline(text)
|
| 79 |
pprint([r['token_str'] for r in res_hf])
|
| 80 |
```
|
| 81 |
+
|
| 82 |
## Training
|
| 83 |
|
| 84 |
### Training data
|