BF_NER: Burkina Faso Named Entity Recognition
Fine-tuned CamemBERT model for extracting geographic entities from French text, specialized for the Burkina Faso administrative hierarchy.
Model Description
This model is a fine-tuned version of camembert-base for Named Entity Recognition (NER) of geographic locations in French news articles. It recognizes five administrative levels specific to Burkina Faso:
- Country: Burkina Faso, regional country references
- Region: 13 regions (e.g., Centre, Hauts-Bassins, Sahel)
- Province: 45 provinces (e.g., Kadiogo, Houet, Soum)
- Department: 351 departments (e.g., Ouagadougou, Bobo-Dioulasso, Koudougou)
- Village: 7,936 villages (e.g., Pabre, Koubri, Sya)
Model Details
- Developed by: Charles Abdoulaye Ngom, Landy Rajaonarivo, Sarah Valentin, Maguelonne Teisseire
- Model type: Token Classification (NER)
- Language: French
- Base model:
camembert-base - License: MIT
- Paper: Spatio-Temporal Knowledge Graph from Unstructured Texts: A Multi-Scale Approach for Food Security Monitoring (AGILE 2026)
- DOI (model): 10.57967/hf/7766
- DOI (datasets): 10.57967/hf/7767
Intended Use
Primary Use Cases
- Food security monitoring: Extract location mentions from news articles to track food security events
- Geographic information extraction: Identify and classify locations in French West African texts
- Multi-scale spatial analysis: Enable analysis from village to country level
- Crisis mapping: Support humanitarian and development organizations in monitoring regional events
Out-of-Scope Use
- This model is NOT suitable for:
- Named entity recognition in other countries (limited to Burkina Faso administrative entities)
- Non-French languages
- Person, organization, or other non-location entity types
- Real-time applications without additional validation
Training Data
The model was trained using distant supervision on 15,000 French news articles from 2009:
| Split | Sentences | Description |
|---|---|---|
| Train | 59,900 | Sentences containing administrative place names |
| Validation | 14,758 | Used for hyperparameter tuning |
| Test | 11,594 | Held-out set with ~20% unseen entities per level |
Data Source: Official gazetteer from the 2022 Statistical Yearbook of Territorial Administration, Burkina Faso Ministry of Territorial Administration.
Annotation Scheme: BIO tagging (Begin-Inside-Outside)
B-{type}: Beginning of an entityI-{type}: Inside/continuation of an entityO: Outside any entity
Entity types: country, region, province, departement, village
Training Procedure
Training Hyperparameters
| Parameter | Value |
|---|---|
| Base model | camembert-base |
| Learning rate | 5e-5 |
| Batch size | 32 |
| Epochs | 70 |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Frozen layers | Embedding layers only |
| Trainable parameters | 85,062,923 / 110,039,819 (77.3%) |
Training Environment
- Hardware: NVIDIA RTX 3090 GPU
- Training time: ~2-3 hours
- Framework: Transformers 4.45.2, PyTorch 2.5.1
Evaluation
Test Set Performance
Evaluated on a held-out test set containing ~20% unseen entities at each hierarchical level:
| Entity Type | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Country | 0.99 | 0.99 | 0.99 | 4,648 |
| Region | 1.00 | 0.99 | 0.99 | 1,433 |
| Province | 0.99 | 0.98 | 0.99 | 541 |
| Department | 0.99 | 0.99 | 0.99 | 6,744 |
| Village | 0.94 | 0.93 | 0.94 | 3,236 |
| Micro avg | 0.98 | 0.98 | 0.98 | 16,602 |
Comparison with Baselines
Tested on 1,000 manually annotated news articles:
| Model | Precision | Recall | F1-Score |
|---|---|---|---|
| Baseline CamemBERT (no fine-tuning) | 0.41 | 0.81 | 0.55 |
| GLiNER (zero-shot) | 0.66 | 0.63 | 0.65 |
Usage
Installation
pip install transformers torch
Basic Usage
from transformers import CamembertTokenizerFast, CamembertForTokenClassification
import torch
# Load model and tokenizer
tokenizer = CamembertTokenizerFast.from_pretrained("CharlesAbdoulaye/BF_NER")
model = CamembertForTokenClassification.from_pretrained("CharlesAbdoulaye/BF_NER")
model.eval()
# Entity labels
label_list = [
"O",
"B-country", "I-country",
"B-region", "I-region",
"B-departement", "I-departement",
"B-province", "I-province",
"B-village", "I-village"
]
id2label = {i: label for i, label in enumerate(label_list)}
# Example text with all 5 entity types
text = "La crise alimentaire au Burkina Faso a frappe la region des Hauts-Bassins et la province du Houet. La ville de Bobo-Dioulasso et le village de Sya sont particulierement touches."
# Tokenize with offset mapping for span extraction
inputs = tokenizer(text, return_tensors="pt", truncation=True, return_offsets_mapping=True)
offset_mapping = inputs.pop("offset_mapping")[0].tolist()
with torch.no_grad():
outputs = model(**inputs)
preds = torch.argmax(outputs.logits, dim=2)[0].tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
# Reconstruct entities with character spans
entities, current = [], None
for idx, (pred_id, (start, end)) in enumerate(zip(preds, offset_mapping)):
if start == 0 and end == 0:
if current: entities.append(current); current = None
continue
label = id2label[pred_id]
is_subword = not tokens[idx].startswith("\u2581")
if label.startswith("B-"):
if current: entities.append(current)
current = {"type": label[2:], "start": start, "end": end}
elif label.startswith("I-") or (is_subword and current):
if current: current["end"] = end
else:
if current: entities.append(current); current = None
if current: entities.append(current)
# Merge consecutive same-type entities (handles hyphenated names)
merged = []
for ent in entities:
if merged and merged[-1]["type"] == ent["type"]:
gap = text[merged[-1]["end"]:ent["start"]]
if gap in ("", "-", " -", "- "):
merged[-1]["end"] = ent["end"]; continue
merged.append(dict(ent))
for ent in merged:
ent["text"] = text[ent["start"]:ent["end"]].rstrip(".,;:!?")
print(f'{ent["text"]:20s} | {ent["type"]:15s} | span=({ent["start"]}, {ent["end"]})')
Expected output:
Burkina Faso | country | span=(24, 36)
Hauts-Bassins | region | span=(60, 73)
Houet | province | span=(92, 97)
Bobo-Dioulasso | departement | span=(111, 125)
Sya | village | span=(143, 146)
Limitations
Geographic scope: The model is trained exclusively on Burkina Faso administrative entities. It will not recognize locations from other countries with the same accuracy.
Temporal coverage: Training data is from 2009. Administrative boundaries and place names may have changed since then.
Homonyms: Village names that exist in multiple provinces may be ambiguous. The model does not perform disambiguation based on context.
Spelling variations: West African toponyms exhibit significant spelling variability (e.g., "Ouagadougou" vs "Ouaga"). The model handles common variations but may miss rare spellings not present in training data.
Language: Only French text is supported. The model will not work on texts in local languages (Mooré, Dioula, Fulfulde, etc.).
Ethical Considerations
Potential Biases
- Media coverage bias: Urban areas (especially Ouagadougou and Bobo-Dioulasso) are overrepresented in news articles compared to rural villages.
- Administrative changes: Administrative boundaries and names may have changed since the 2022 gazetteer was published.
- Language bias: French-language bias excludes indigenous language place names and local toponyms.
Responsible Use
This model is intended for research and humanitarian applications:
- ✅ Food security monitoring and early warning systems
- ✅ Geographic information extraction for development organizations
- ✅ Academic research on crisis mapping and NLP
Citation
If you use this model in your research, please cite:
@article{ngom2026stkgfs,
title={Spatio-Temporal Knowledge Graph from Unstructured Texts: A Multi-Scale Approach for Food Security Monitoring},
author={Ngom, Charles Abdoulaye and Rajaonarivo, Landy and Valentin, Sarah and Teisseire, Maguelonne},
journal={AGILE: GIScience Series},
year={2026},
}
Contact
For questions about this model:
- Charles Abdoulaye Ngom: charles.ngom@inrae.fr
- Landy Rajaonarivo: landy.rajaonarivo@inrae.fr
- Sarah Valentin: sarah.valentin@cirad.fr
- Maguelonne Teisseire: maguelonne.teisseire@inrae.fr
Acknowledgments
- Administrative hierarchy data: 2022 Statistical Yearbook, Burkina Faso Ministry of Territorial Administration
- Base model: CamemBERT (Martin et al., 2020)
- Geographic enrichment: Wikidata
License
This model is released under the MIT License. See the LICENSE file for details.
- Downloads last month
- 36
Dataset used to train CharlesAbdoulaye/BF_NER
Evaluation results
- F1 (micro avg) on Burkina Faso Administrative Hierarchyself-reported0.980
- Precision (micro avg) on Burkina Faso Administrative Hierarchyself-reported0.980
- Recall (micro avg) on Burkina Faso Administrative Hierarchyself-reported0.980