Commit
·
6760505
1
Parent(s):
c8540f5
Update README.md
Browse files
README.md
CHANGED
|
@@ -39,6 +39,21 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
|
|
| 39 |
- 143 Million Tokens (1GB of text data)
|
| 40 |
- Tokenizer Vocabulary Size: 70,000 tokens
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
## Training Procedure
|
| 43 |
|
| 44 |
For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
|
|
|
|
| 39 |
- 143 Million Tokens (1GB of text data)
|
| 40 |
- Tokenizer Vocabulary Size: 70,000 tokens
|
| 41 |
|
| 42 |
+
## Intended uses & limitations
|
| 43 |
+
`afriteva_small` is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
| 47 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_small")
|
| 48 |
+
>>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_small")
|
| 49 |
+
>>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
|
| 50 |
+
>>> tgt_text = "Would you like to be?"
|
| 51 |
+
>>> model_inputs = tokenizer(src_text, return_tensors="pt")
|
| 52 |
+
>>> with tokenizer.as_target_tokenizer():
|
| 53 |
+
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
|
| 54 |
+
>>> model(**model_inputs, labels=labels) # forward pass
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
## Training Procedure
|
| 58 |
|
| 59 |
For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
|