updated model card
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ language:
|
|
| 15 |
- multilingual
|
| 16 |
|
| 17 |
---
|
| 18 |
-
#
|
| 19 |
## Model description
|
| 20 |
AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
|
| 21 |
The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
|
|
@@ -33,13 +33,13 @@ For example, assuming we want to finetune this model on a token classification t
|
|
| 33 |
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 34 |
>>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
|
| 35 |
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
|
| 36 |
-
# we have to manually set the model max length because it is an imported sentencepiece model which
|
| 37 |
>>> tokenizer.model_max_length = 512
|
| 38 |
```
|
| 39 |
|
| 40 |
#### Limitations and bias
|
| 41 |
-
This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time.
|
| 42 |
-
|
| 43 |
|
| 44 |
## Training data
|
| 45 |
The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.
|
|
|
|
| 15 |
- multilingual
|
| 16 |
|
| 17 |
---
|
| 18 |
+
# afriberta_small
|
| 19 |
## Model description
|
| 20 |
AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
|
| 21 |
The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
|
|
|
|
| 33 |
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 34 |
>>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
|
| 35 |
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
|
| 36 |
+
# we have to manually set the model max length because it is an imported trained sentencepiece model, which huggingface does not properly support right now
|
| 37 |
>>> tokenizer.model_max_length = 512
|
| 38 |
```
|
| 39 |
|
| 40 |
#### Limitations and bias
|
| 41 |
+
- This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time. Thus, it may not generalize well.
|
| 42 |
+
- This model is trained on very little data (less than 1 GB), hence it may not have seen enough data to learn very complex linguistic relations.
|
| 43 |
|
| 44 |
## Training data
|
| 45 |
The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.
|