Instructions to use amidblue/AfrimBert-QA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amidblue/AfrimBert-QA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="amidblue/AfrimBert-QA")# Load model directly from transformers import AutoTokenizer, AutoModelForQuestionAnswering tokenizer = AutoTokenizer.from_pretrained("amidblue/AfrimBert-QA") model = AutoModelForQuestionAnswering.from_pretrained("amidblue/AfrimBert-QA") - Notebooks
- Google Colab
- Kaggle
AfrimBert-QA
Model Description
AfrimBert-QA is a fine-tuned version of amidblue/mBertKE trained on the amidblue/AfriQuAD dataset. It is designed for extractive question answering — both monolingual and cross-lingual — across African languages.
Supported Languages
The model covers 15 languages (14 African + English for cross-lingual QA) across East, West, Central and Southern Africa. Counts below are estimated from stratified sampling of the full 14,400-row AfriQuAD dataset.
East Africa
| Language | ISO | Region | Est. Examples |
|---|---|---|---|
| Swahili (Kiswahili) | sw |
Kenya, Tanzania, Uganda | ~5,674 |
| Luo (Dholuo) | luo |
Kenya, Uganda, Tanzania | ~1,285 |
| Kinyarwanda | kin |
Rwanda | ~1,231 |
| Kikuyu (Gĩkũyũ) | kik |
Kenya | ~268 |
| Luganda | lug |
Uganda | ~107 |
| Maasai (Maa) | mas |
Kenya, Tanzania | ~54 |
West Africa
| Language | ISO | Region | Est. Examples |
|---|---|---|---|
| Igbo | ibo |
Nigeria | ~1,338 |
| Twi (Akan) | twi |
Ghana | ~1,178 |
| Fon | fon |
Benin | ~1,124 |
| Hausa | hau |
Nigeria, Niger | ~589 |
| Yoruba | yor |
Nigeria | ~268 |
Southern / Central Africa
| Language | ISO | Region | Est. Examples |
|---|---|---|---|
| Zulu (isiZulu) | zul |
South Africa | ~857 |
| Bemba | bem |
Zambia, DRC | ~321 |
| Lingala | lin |
DRC, Congo, CAR | ~54 |
Cross-Lingual
| Language | ISO | Notes | Est. Examples |
|---|---|---|---|
| English | en |
Cross-lingual QA pairs | ~54 |
Summary
| Count | |
|---|---|
| Total languages | 15 |
| Total QA examples | ~14,400 |
| Dominant language (Swahili) | ~39% |
| Largest non-Swahili language (Igbo) | ~9% |
Note: Luhya (
luy), Kalenjin (kln), and Gusii (guz) appear in the dataset's HF metadata tags but were not observed in the sampled rows — they may be present in very small quantities or as part of cross-lingual pairs.
Training Data
The model was trained on a combination of the following datasets:
- KENSQUAD — Kenyan extractive QA dataset
- AFRIQA — Pan-African QA benchmark
- Custom data — Additional data collected for languages not covered by AFRIQA and KENSQUAD
AfriQuAD Dataset Stats
| Split | Rows |
|---|---|
| Train | ~11,500 |
| Validation | ~1,400 |
| Test | ~1,400 |
| Total | ~14,300 |
Cross-lingual QA Dataset Stats
| Type | Approximate Size |
|---|---|
| Generated cross-lingual QA pairs | ~800 examples |
| Translated cross-lingual QA pairs | ~800 examples |
Usage
Note: The model is gated on Hugging Face. Request access at amidblue/AfrimBert-QA, then authenticate locally:
pip install transformers torch huggingface-cli login
Quick start
from transformers import pipeline
qa = pipeline("question-answering", model="amidblue/AfrimBert-QA")
# Luo (monolingual)
context = "Ji mang'eny ok winjre gi kaka chama mar ODM iriembo. Tinde nitie koko mang'eny e chama no."
question = "Chama mane ema ji oko hero kaka iriembo?"
result = qa(question=question, context=context)
print(f"Question : {question}")
print(f"Answer : {result['answer']}")
print(f"Score : {result['score']:.4f} | span [{result['start']}:{result['end']}]")
Output1
──────────────────────────────────────────────────────────────────────
Language : Luo (Dholuo) [luo]
Question : Chama mane ema ji oko hero kaka iriembo?
Answer : ODM
Score : 0.7412 | span [40:43]
──────────────────────────────────────────────────────────────────────
Multi-language inference script
"""
AfrimBert-QA Inference Script
-------------------------------
Runs extractive QA across multiple African languages using amidblue/AfrimBert-QA.
Covers monolingual and cross-lingual examples.
Usage:
python run_afrimbert_qa.py
Requirements:
pip install transformers torch
Notes:
The model is gated on Hugging Face. Request access at:
https://huggingface.co/amidblue/AfrimBert-QA
Then authenticate: huggingface-cli login
"""
from transformers import pipeline
# Load model
MODEL_ID = "amidblue/AfrimBert-QA"
print(f"Loading model: {MODEL_ID} ...")
qa = pipeline("question-answering", model=MODEL_ID)
print("Model loaded.\n")
print("=" * 70)
# Test examples per language
EXAMPLES = [
{
"lang": "Luo (Dholuo)", "iso": "luo",
"context": "Ji mang'eny ok winjre gi kaka chama mar ODM iriembo. Tinde nitie koko mang'eny e chama no.",
"question": "Chama mane ema ji oko hero kaka iriembo?",
},
{
"lang": "Swahili (Kiswahili)", "iso": "sw",
"context": "Wangari Maathai alikuwa mwanamke wa kwanza wa Kiafrika kutuzwa Tuzo la Amani la Nobel mwaka 2004. Alianzisha Harakati ya Ukanda wa Kijani nchini Kenya.",
"question": "Wangari Maathai alipewa tuzo gani?",
},
{
"lang": "Kikuyu (Gĩkũyũ)", "iso": "kik",
"context": "Terebiceni ni mūtambo ūhũthĩkaga harī gūtūma ndūmīrīri cia mbica irathiī. Mītambo īno yambirie kũhũthĩka mīaka-inī ya 1920s.",
"question": "Terebiceni yambirie kũhũthĩka rĩarĩ?",
},
]
# inference
results = []
for ex in EXAMPLES:
out = qa(question=ex["question"], context=ex["context"])
results.append({**ex, **out})
print(f"Language : {ex['lang']} [{ex['iso']}]")
print(f"Context : {ex['context']}")
print(f"Question : {ex['question']}")
print(f"Answer : {out['answer']}")
print(f"Score : {out['score']:.4f} | span [{out['start']}:{out['end']}]")
print("-" * 70)
# Summary
print("\nSummary")
print("=" * 70)
print(f"{'Language':<46} {'Answer':<22} {'Score':>7}")
print("-" * 70)
for r in results:
ans = r["answer"][:20] + "…" if len(r["answer"]) > 21 else r["answer"]
print(f"{r['lang']:<46} {ans:<22} {r['score']:>7.4f}")
print("=" * 70)
Output2
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Language : Luo (Dholuo) [luo]
Context : Ji mang'eny ok winjre gi kaka chama mar ODM iriembo. Tinde nitie koko mang'eny e chama no.
Question : Chama mane ema ji oko hero kaka iriembo?
Answer : ODM
Score : 0.7412 | span [40:43]
----------------------------------------------------------------------
Language : Swahili (Kiswahili) [sw]
Context : Wangari Maathai alikuwa mwanamke wa kwanza wa Kiafrika kutuzwa Tuzo la Amani la Nobel mwaka 2004. Alianzisha Harakati ya Ukanda wa Kijani nchini Kenya.
Question : Wangari Maathai alipewa tuzo gani?
Answer : Amani la Nobel
Score : 0.5133 | span [71:85]
----------------------------------------------------------------------
Language : Kikuyu (Gĩkũyũ) [kik]
Context : Terebiceni ni mūtambo ūhũthĩkaga harī gūtūma ndūmīrīri cia mbica irathiī. Mītambo īno yambirie kũhũthĩka mīaka-inī ya 1920s.
Question : Terebiceni yambirie kũhũthĩka rĩarĩ?
Answer : ya 1920s
Score : 0.2473 | span [115:123]
----------------------------------------------------------------------
Summary
======================================================================
Language Answer Score
----------------------------------------------------------------------
Luo (Dholuo) ODM 0.7412
Swahili (Kiswahili) Tuzo la Amani la Noble 0.5133
Kikuyu (Gĩkũyũ) mīaka-inī ya 1920s 0.2473
======================================================================
Citation
If you use this model or its associated dataset, please cite:
@misc{afrimbert-qa,
author = {Theophilus Lincoln Owiti and Alukwe Jones Terah},
title = {AfrimBert-QA: Extractive Question Answering for African Languages},
year = {2026},
publisher = {Hugging Face},
note = {Carnegie Mellon University, Amidblue},
url = {https://huggingface.co/amidblue/AfrimBert-QA}
}
Authors:
- Theophilus Linicon Owiti — Carnegie Mellon University / Amidblue
- Alukwe Jones Terah — Amidblue
Model Card Authors
Theophilus Linicon Owiti & Alukwe Jones Terah
- Downloads last month
- 38
Model tree for amidblue/AfrimBert-QA
Base model
amidblue/mBertKE