Commit
·
114aef1
1
Parent(s):
de3a998
sample code
Browse files
README.md
CHANGED
|
@@ -14,6 +14,27 @@ Intended Examples:
|
|
| 14 |
|
| 15 |
People's names, gender pronouns, gendered words (father, mother), and many other values are currently unchanged by this model. Future versions may be trained on more data.
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
## Training
|
| 18 |
|
| 19 |
I originally developed
|
|
|
|
| 14 |
|
| 15 |
People's names, gender pronouns, gendered words (father, mother), and many other values are currently unchanged by this model. Future versions may be trained on more data.
|
| 16 |
|
| 17 |
+
## Sample Code
|
| 18 |
+
|
| 19 |
+
```
|
| 20 |
+
import torch
|
| 21 |
+
from transformers import AutoTokenizer, EncoderDecoderModel
|
| 22 |
+
|
| 23 |
+
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
|
| 24 |
+
"monsoon-nlp/ar-seq2seq-gender-encoder",
|
| 25 |
+
"monsoon-nlp/ar-seq2seq-gender-decoder",
|
| 26 |
+
min_length=40
|
| 27 |
+
)
|
| 28 |
+
tokenizer = AutoTokenizer.from_pretrained('monsoon-nlp/ar-seq2seq-gender-decoder') # same as MARBERT original
|
| 29 |
+
|
| 30 |
+
input_ids = torch.tensor(tokenizer.encode("أنا سعيدة")).unsqueeze(0)
|
| 31 |
+
generated = model.generate(input_ids, decoder_start_token_id=model.config.decoder.pad_token_id)
|
| 32 |
+
tokenizer.decode(generated.tolist()[0][1 : len(input_ids[0]) - 1])
|
| 33 |
+
> 'انا سعيد'
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
https://colab.research.google.com/drive/1S0kE_2WiV82JkqKik_sBW-0TUtzUVmrV?usp=sharing
|
| 37 |
+
|
| 38 |
## Training
|
| 39 |
|
| 40 |
I originally developed
|