JuaKazi Swahili Gender Bias Classifier v3
Fine-tuned afro-xlmr-base for binary gender bias detection in Swahili text.
Part of the JuaKazi Gender Sensitization Engine — the only tool in East Africa that detects, corrects, and explains gender bias in African-language text.
Validation Metrics (v3)
| Metric | Score |
|---|---|
| F1 | 0.673 |
| Precision | 0.672 |
| Recall | 0.674 |
v1 (3 epochs): F1=0.854 · P=0.938 · R=0.784
Labels
| ID | Label |
|---|---|
| 0 | NEUTRAL |
| 1 | BIAS |
Training Details
- Base model: Davlan/afro-xlmr-base (XLM-RoBERTa fine-tuned on African languages)
- Training data: 64,723 Swahili ground truth rows (JuaKazi dataset)
- Oversampling: Minority class (BIAS) oversampled to 25% of training set
- Class weighting: WeightedTrainer with CrossEntropyLoss
- Epochs: 5 · Learning rate: 1e-5 · Batch size: 16/32 · Hardware: Kaggle T4 x2
Usage in JuaKazi
Stage 2 fallback — Swahili only, warn-only, never sets has_bias_detected=True directly.
Set in HuggingFace Space secrets:
JUAKAZI_ML_MODEL = juakazike/sw-bias-classifier-v3
JUAKAZI_ML_THRESHOLD = 0.75
Quick Start
from transformers import pipeline
classifier = pipeline("text-classification", model="juakazike/sw-bias-classifier-v3")
classifier("Daktari wa kiume alipima mgonjwa.")
# [{'label': 'BIAS', 'score': 0.89}]
Limitations
- Swahili only (Kenya + Tanzania). No Sheng/Uganda coverage.
- Cohen's Kappa not yet measured — 2nd annotator not yet recruited.
- ~33% false negative rate (recall gap).
- Downloads last month
- 27
Model tree for juakazike/sw-bias-classifier-v3
Base model
Davlan/afro-xlmr-base