--- language: - en - ny - bem tags: - text-classification - multilingual - transformer - zambia - lusaka - code-switching license: apache-2.0 library_name: transformers pipeline_tag: text-classification base_model: - Kelvinmbewe/mbert_Lusaka_Language_Analysis - google-bert/bert-base-multilingual-cased metrics: - accuracy - precision - recall - macro_f1 - micro_f1 - validation_loss - confusion_matrix model-index: - name: LusakaLang results: - task: type: text-classification name: Topic Classification dataset: name: LusakaLang Topic Dataset type: lusakalang config: default split: validation metrics: - type: accuracy value: 0.99259 name: accuracy - type: precision value: 0.98730 name: precision - type: recall value: 0.99128 name: recall - type: f1 value: 0.98926 name: macro_f1 - type: f1 value: 0.99259 name: micro_f1 - type: loss value: 0.05233 name: validation_loss --- # **LusakaLang Topic Analysis Model** This model was trained using its sister model, `mbert_LusakaLang_Sentiment_Analysis`, which was fine‑tuned on sentiment data spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication. ## Training Details ```python - Base model: `mbert_LusakaLang_Sentiment_Analysis` - Epochs: 20 - Class weights: enabled (to correct class imbalance) - Optimizer: AdamW - Loss: Weighted cross‑entropy - Temperature scaling: T = 2.3 (applied at inference time) ``` ## **Why Temperature Scaling?** ```python Class‑weighted training sharpens logits. Temperature scaling at T = 2.3 improves: - Confidence calibration - Noise robustness - Handling of positive/neutral text - Foreign‑language generalization - Reduction of overconfident misclassifications ``` ## Training Data ```python The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia. To ensure authenticity: - All samples were reviewed by a native Zambian speaker - Mixed langauge and slang patterns were corrected - Local idioms and slang were added - Unnatural AI‑generated phrasing was removed - Bemba/Nyanja grammars and tone were validated This hybrid approach ensures tha the dataset reflects real Zambian communication style. ``` ## Train and Validation Loss ![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/OnagZY8nhxv-bOejq2m0B.png) ## Confusion Matrix ![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/Qk6rvSrTyeWHl90BrpNQZ.png) ## Word Cloud ![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/dZb3Tq2FBAKztlIp9asCs.png)