bonadossou
/

afrolm_active_learning

active learning

language modeling

research papers

natural language processing

self-active learning

Model card Files Files and versions

Add multilingual to the language tag

#2

by lbourdois - opened Jan 7, 2023

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +13 -12

README.md CHANGED Viewed

@@ -1,6 +1,4 @@
 ---
-annotations_creators:
-- crowdsourced
 language:
 - amh
 - orm
@@ -25,17 +23,9 @@ language:
 - twi
 - xho
 - zul
-language_creators:
-- crowdsourced
 license:
 - cc-by-4.0
-multilinguality:
-- monolingual
-pretty_name: afrolm-dataset
-size_categories:
-- 1M<n<10M
-source_datasets:
-- original
 tags:
 - afrolm
 - active learning
@@ -43,6 +33,17 @@ tags:
 - research papers
 - natural language processing
 - self-active learning
 task_categories:
 - fill-mask
 task_ids:
@@ -57,7 +58,7 @@ This repository contains the model for our paper [`AfroLM: A Self-Active Learnin
 ![Model](afrolm.png)
 ## Languages Covered
-AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
 ## Evaluation Results
 AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below are the average F1-score performances of various models, across various datasets. Please consult our paper for more language-level performance.

 ---
 language:
 - amh
 - orm
 - twi
 - xho
 - zul
+- multilingual
 license:
 - cc-by-4.0
 tags:
 - afrolm
 - active learning
 - research papers
 - natural language processing
 - self-active learning
+annotations_creators:
+- crowdsourced
+language_creators:
+- crowdsourced
+multilinguality:
+- monolingual
+pretty_name: afrolm-dataset
+size_categories:
+- 1M<n<10M
+source_datasets:
+- original
 task_categories:
 - fill-mask
 task_ids:
 ![Model](afrolm.png)
 ## Languages Covered
+AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomal�, �w�, Fon, Hausa, �gb�, Kinyarwanda, Lingala, Luganda, Luo, Moor�, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yor�b�, and Zulu.
 ## Evaluation Results
 AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below are the average F1-score performances of various models, across various datasets. Please consult our paper for more language-level performance.