Fernando Carneiro commited on
Commit
966dd59
·
1 Parent(s): ecc3515
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -5,4 +5,23 @@ license: apache-2.0
5
 
6
  # <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
7
 
8
- Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
  # <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
7
 
8
+ Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.
9
+
10
+ ## Usage
11
+
12
+ ```python
13
+ import torch
14
+ from transformers import AutoModel, AutoTokenizer
15
+
16
+ model = AutoModel.from_pretrained('melll-uff/bertweetbr')
17
+
18
+ tokenizer = AutoTokenizer.from_pretrained('melll-uff/bertweetbr')
19
+
20
+ # INPUT TWEET IS ALREADY NORMALIZED!
21
+ line = "Tem vídeo novo no canal do @USER :rosto_sorridente_com_olhos_de_coração: Passem por lá e confiram : HTTPURL"
22
+
23
+ input_ids = tokenizer(line, return_tensors="pt")
24
+
25
+ with torch.no_grad():
26
+ features = model(**input_ids) # Models outputs are now tuples
27
+ ```