Fernando Carneiro commited on
Commit ·
966dd59
1
Parent(s): ecc3515
Readme
Browse files
README.md
CHANGED
|
@@ -5,4 +5,23 @@ license: apache-2.0
|
|
| 5 |
|
| 6 |
# <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
|
| 7 |
|
| 8 |
-
Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
# <a name="introduction"></a> BERTweet.BR: A Pre-Trained Language Model for Tweets in Portuguese
|
| 7 |
|
| 8 |
+
Having the same architecture of [BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet) we trained our model from scratch following [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) pre-training procedure on a corpus of approximately 9GB containing 100M Portuguese Tweets.
|
| 9 |
+
|
| 10 |
+
## Usage
|
| 11 |
+
|
| 12 |
+
```python
|
| 13 |
+
import torch
|
| 14 |
+
from transformers import AutoModel, AutoTokenizer
|
| 15 |
+
|
| 16 |
+
model = AutoModel.from_pretrained('melll-uff/bertweetbr')
|
| 17 |
+
|
| 18 |
+
tokenizer = AutoTokenizer.from_pretrained('melll-uff/bertweetbr')
|
| 19 |
+
|
| 20 |
+
# INPUT TWEET IS ALREADY NORMALIZED!
|
| 21 |
+
line = "Tem vídeo novo no canal do @USER :rosto_sorridente_com_olhos_de_coração: Passem por lá e confiram : HTTPURL"
|
| 22 |
+
|
| 23 |
+
input_ids = tokenizer(line, return_tensors="pt")
|
| 24 |
+
|
| 25 |
+
with torch.no_grad():
|
| 26 |
+
features = model(**input_ids) # Models outputs are now tuples
|
| 27 |
+
```
|