projecte-aina
/

roberta-base-ca-v2

RoBERTa-base-ca-v2

Catalan Textual Corpus

Model card Files Files and versions

gonzalez-agirre commited on Jul 22, 2022

Commit

081e10c

·

1 Parent(s): 19ba203

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ widget:
 ## Model description
-RoBERTa-ca-v2 is a transformer-based masked language model for the Catalan language.
 It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
 and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
@@ -74,10 +74,11 @@ tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-base-ca-v2')
 model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
 model.eval()
 pipeline = FillMaskPipeline(model, tokenizer_hf)
-text = f"Em dic <mask>."137,775
 res_hf = pipeline(text)
 pprint([r['token_str'] for r in res_hf])
 ```
 ## Training
 ### Training data

 ## Model description
+The **roberta-base-ca-v2** is a transformer-based masked language model for the Catalan language.
 It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) base model
 and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
 model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-base-ca-v2')
 model.eval()
 pipeline = FillMaskPipeline(model, tokenizer_hf)
+text = f"Em dic <mask>."
 res_hf = pipeline(text)
 pprint([r['token_str'] for r in res_hf])
 ```
 ## Training
 ### Training data