JuaKazi Swahili Gender Bias Classifier v3

Fine-tuned afro-xlmr-base for binary gender bias detection in Swahili text.

Part of the JuaKazi Gender Sensitization Engine — the only tool in East Africa that detects, corrects, and explains gender bias in African-language text.

Validation Metrics (v3)

Metric Score
F1 0.673
Precision 0.672
Recall 0.674

v1 (3 epochs): F1=0.854 · P=0.938 · R=0.784

Labels

ID Label
0 NEUTRAL
1 BIAS

Training Details

  • Base model: Davlan/afro-xlmr-base (XLM-RoBERTa fine-tuned on African languages)
  • Training data: 64,723 Swahili ground truth rows (JuaKazi dataset)
  • Oversampling: Minority class (BIAS) oversampled to 25% of training set
  • Class weighting: WeightedTrainer with CrossEntropyLoss
  • Epochs: 5 · Learning rate: 1e-5 · Batch size: 16/32 · Hardware: Kaggle T4 x2

Usage in JuaKazi

Stage 2 fallback — Swahili only, warn-only, never sets has_bias_detected=True directly.

Set in HuggingFace Space secrets:

JUAKAZI_ML_MODEL = juakazike/sw-bias-classifier-v3
JUAKAZI_ML_THRESHOLD = 0.75

Quick Start

from transformers import pipeline
classifier = pipeline("text-classification", model="juakazike/sw-bias-classifier-v3")
classifier("Daktari wa kiume alipima mgonjwa.")
# [{'label': 'BIAS', 'score': 0.89}]

Limitations

  • Swahili only (Kenya + Tanzania). No Sheng/Uganda coverage.
  • Cohen's Kappa not yet measured — 2nd annotator not yet recruited.
  • ~33% false negative rate (recall gap).
Downloads last month
27
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juakazike/sw-bias-classifier-v3

Finetuned
(70)
this model

Dataset used to train juakazike/sw-bias-classifier-v3