Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/MoseliMotsoehli/zuBERTa/README.md
README.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: zu
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# zuBERTa
|
| 6 |
+
zuBERTa is a RoBERTa style transformer language model trained on zulu text.
|
| 7 |
+
|
| 8 |
+
## Intended uses & limitations
|
| 9 |
+
The model can be used for getting embeddings to use on a down-stream task such as question answering.
|
| 10 |
+
|
| 11 |
+
#### How to use
|
| 12 |
+
|
| 13 |
+
```python
|
| 14 |
+
>>> from transformers import pipeline
|
| 15 |
+
>>> from transformers import AutoTokenizer, AutoModelWithLMHead
|
| 16 |
+
|
| 17 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
|
| 18 |
+
>>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
|
| 19 |
+
>>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
|
| 20 |
+
>>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
|
| 21 |
+
|
| 22 |
+
[
|
| 23 |
+
{
|
| 24 |
+
"sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
|
| 25 |
+
"score": 0.050459690392017365,
|
| 26 |
+
"token": 555,
|
| 27 |
+
"token_str": "Ġkhona"
|
| 28 |
+
},
|
| 29 |
+
{
|
| 30 |
+
"sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
|
| 31 |
+
"score": 0.03668094798922539,
|
| 32 |
+
"token": 2321,
|
| 33 |
+
"token_str": "Ġinkosi"
|
| 34 |
+
},
|
| 35 |
+
{
|
| 36 |
+
"sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
|
| 37 |
+
"score": 0.028774697333574295,
|
| 38 |
+
"token": 5101,
|
| 39 |
+
"token_str": "Ġubukhosi"
|
| 40 |
+
}
|
| 41 |
+
]
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
## Training data
|
| 45 |
+
|
| 46 |
+
1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings.
|
| 47 |
+
2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
|
| 48 |
+
|
| 49 |
+
### BibTeX entry and citation info
|
| 50 |
+
|
| 51 |
+
```bibtex
|
| 52 |
+
@inproceedings{author = {Moseli Motsoehli},
|
| 53 |
+
title = {Towards transformation of Southern African language models through transformers.},
|
| 54 |
+
year={2020}
|
| 55 |
+
}
|
| 56 |
+
```
|