Add multilingual to the language tag
#2
by
lbourdois
- opened
README.md
CHANGED
|
@@ -1,6 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
annotations_creators:
|
| 3 |
-
- crowdsourced
|
| 4 |
language:
|
| 5 |
- amh
|
| 6 |
- orm
|
|
@@ -25,17 +23,9 @@ language:
|
|
| 25 |
- twi
|
| 26 |
- xho
|
| 27 |
- zul
|
| 28 |
-
|
| 29 |
-
- crowdsourced
|
| 30 |
license:
|
| 31 |
- cc-by-4.0
|
| 32 |
-
multilinguality:
|
| 33 |
-
- monolingual
|
| 34 |
-
pretty_name: afrolm-dataset
|
| 35 |
-
size_categories:
|
| 36 |
-
- 1M<n<10M
|
| 37 |
-
source_datasets:
|
| 38 |
-
- original
|
| 39 |
tags:
|
| 40 |
- afrolm
|
| 41 |
- active learning
|
|
@@ -43,6 +33,17 @@ tags:
|
|
| 43 |
- research papers
|
| 44 |
- natural language processing
|
| 45 |
- self-active learning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
task_categories:
|
| 47 |
- fill-mask
|
| 48 |
task_ids:
|
|
@@ -57,7 +58,7 @@ This repository contains the model for our paper [`AfroLM: A Self-Active Learnin
|
|
| 57 |

|
| 58 |
|
| 59 |
## Languages Covered
|
| 60 |
-
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomal
|
| 61 |
|
| 62 |
## Evaluation Results
|
| 63 |
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below are the average F1-score performances of various models, across various datasets. Please consult our paper for more language-level performance.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- amh
|
| 4 |
- orm
|
|
|
|
| 23 |
- twi
|
| 24 |
- xho
|
| 25 |
- zul
|
| 26 |
+
- multilingual
|
|
|
|
| 27 |
license:
|
| 28 |
- cc-by-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
tags:
|
| 30 |
- afrolm
|
| 31 |
- active learning
|
|
|
|
| 33 |
- research papers
|
| 34 |
- natural language processing
|
| 35 |
- self-active learning
|
| 36 |
+
annotations_creators:
|
| 37 |
+
- crowdsourced
|
| 38 |
+
language_creators:
|
| 39 |
+
- crowdsourced
|
| 40 |
+
multilinguality:
|
| 41 |
+
- monolingual
|
| 42 |
+
pretty_name: afrolm-dataset
|
| 43 |
+
size_categories:
|
| 44 |
+
- 1M<n<10M
|
| 45 |
+
source_datasets:
|
| 46 |
+
- original
|
| 47 |
task_categories:
|
| 48 |
- fill-mask
|
| 49 |
task_ids:
|
|
|
|
| 58 |

|
| 59 |
|
| 60 |
## Languages Covered
|
| 61 |
+
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomal�, �w�, Fon, Hausa, �gb�, Kinyarwanda, Lingala, Luganda, Luo, Moor�, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yor�b�, and Zulu.
|
| 62 |
|
| 63 |
## Evaluation Results
|
| 64 |
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below are the average F1-score performances of various models, across various datasets. Please consult our paper for more language-level performance.
|