melll-uff
/

bertweetbr

Model card Files Files and versions

Fernando Carneiro commited on Sep 11, 2022

Commit

966dd59

·

1 Parent(s): ecc3515

Readme

Files changed (1) hide show

README.md +20 -1

README.md CHANGED Viewed

@@ -5,4 +5,23 @@ license: apache-2.0
 # <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
-Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.

 # <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
+Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.
+## Usage
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+model = AutoModel.from_pretrained('melll-uff/bertweetbr')
+tokenizer = AutoTokenizer.from_pretrained('melll-uff/bertweetbr')
+# INPUT TWEET IS ALREADY NORMALIZED!
+line = "Tem vídeo novo no canal do @USER :rosto_sorridente_com_olhos_de_coração: Passem por lá e confiram : HTTPURL"
+input_ids = tokenizer(line, return_tensors="pt")
+with torch.no_grad():
+    features = model(**input_ids)  # Models outputs are now tuples
+```