castorini
/

afriteva_small

text2text-generation

text-generation-inference

Model card Files Files and versions

ToluClassics commited on May 24, 2022

Commit

6760505

·

1 Parent(s): c8540f5

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -39,6 +39,21 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)

 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
+## Intended uses & limitations
+`afriteva_small` is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
+```python
+>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_small")
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_small")
+>>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
+>>> tgt_text =  "Would you like to be?"
+>>> model_inputs = tokenizer(src_text, return_tensors="pt")
+>>> with tokenizer.as_target_tokenizer():
+        labels = tokenizer(tgt_text, return_tensors="pt").input_ids
+>>> model(**model_inputs, labels=labels) # forward pass
+```
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)