---
language:
- en
- ny
- bem
tags:
- text-classification
- multilingual
- transformer
- zambia
- lusaka
- code-switching
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
base_model:
- Kelvinmbewe/mbert_Lusaka_Language_Analysis
- google-bert/bert-base-multilingual-cased
metrics:
- accuracy
- precision
- recall
- macro_f1
- micro_f1
- validation_loss
- confusion_matrix
model-index:
- name: LusakaLang
  results:
  - task:
      type: text-classification
      name: Topic Classification
    dataset:
      name: LusakaLang Topic Dataset
      type: lusakalang
      config: default
      split: validation
    metrics:
    - type: accuracy
      value: 0.99259
      name: accuracy
    - type: precision
      value: 0.98730
      name: precision
    - type: recall
      value: 0.99128
      name: recall
    - type: f1
      value: 0.98926
      name: macro_f1
    - type: f1
      value: 0.99259
      name: micro_f1
    - type: loss
      value: 0.05233
      name: validation_loss
---

# **LusakaLang Topic Analysis Model**


This model was trained using its sister model, `mbert_LusakaLang_Sentiment_Analysis`, which was fine‑tuned on sentiment data 
spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication.


## Training Details

```python
- Base model: `mbert_LusakaLang_Sentiment_Analysis`
- Epochs: 20  
- Class weights: enabled (to correct class imbalance)  
- Optimizer: AdamW  
- Loss: Weighted cross‑entropy  
- Temperature scaling: T = 2.3 (applied at inference time)
```

## **Why Temperature Scaling?**
```python
Class‑weighted training sharpens logits.  
Temperature scaling at T = 2.3 improves:

- Confidence calibration  
- Noise robustness  
- Handling of positive/neutral text  
- Foreign‑language generalization  
- Reduction of overconfident misclassifications  
```

## Training Data
```python
The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia.  
To ensure authenticity:

- All samples were reviewed by a native Zambian speaker  
- Mixed langauge and slang patterns were corrected  
- Local idioms and slang were added  
- Unnatural AI‑generated phrasing was removed  
- Bemba/Nyanja grammars and tone were validated  

This hybrid approach ensures tha the dataset reflects real Zambian communication style.
```


## Train and Validation Loss
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/OnagZY8nhxv-bOejq2m0B.png)

## Confusion Matrix
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/Qk6rvSrTyeWHl90BrpNQZ.png)

## Word Cloud
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/dZb3Tq2FBAKztlIp9asCs.png)