LusakaLang Topic Analysis Model

This model was trained using its sister model, mbert_LusakaLang_Sentiment_Analysis, which was fine‑tuned on sentiment data spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication.

Training Details

- Base model: `mbert_LusakaLang_Sentiment_Analysis`
- Epochs: 20  
- Class weights: enabled (to correct class imbalance)  
- Optimizer: AdamW  
- Loss: Weighted cross‑entropy  
- Temperature scaling: T = 2.3 (applied at inference time)

Why Temperature Scaling?

Class‑weighted training sharpens logits.  
Temperature scaling at T = 2.3 improves:

- Confidence calibration  
- Noise robustness  
- Handling of positive/neutral text  
- Foreign‑language generalization  
- Reduction of overconfident misclassifications

Training Data

The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia.  
To ensure authenticity:

- All samples were reviewed by a native Zambian speaker  
- Mixed langauge and slang patterns were corrected  
- Local idioms and slang were added  
- Unnatural AI‑generated phrasing was removed  
- Bemba/Nyanja grammars and tone were validated  

This hybrid approach ensures tha the dataset reflects real Zambian communication style.