LusakaLang — Multilingual Sentiment Classification Model (English, Bemba, Nyanja)
Model Description
LusakaLang is a fine‑tuned version of bert-base-multilingual-cased designed for multilingual sentiment analysis in Zambia’s linguistic landscape. It is optimized for Zambian English, Bemba, Nyanja, and the highly common code‑switching patterns used in Lusaka and other urban regions.
The model captures:
- Zambian English idioms
- Bemba and Nyanja sentiment cues
- Mixed‑language slang
- Urban Lusaka code‑switching
- Indirect emotional expressions common in Zambian communication
This makes LusakaLang highly effective for real‑world sentiment tasks such as customer feedback, social media monitoring, and conversational analysis.
Training Performance (Epoch 30 — Final Model)
The model was trained for 30 epochs, with epoch 30 selected as the optimal checkpoint based on macro‑F1 performance and generalization stability.
Final Test Results (Epoch 30)
| Metric | Score |
|---|---|
| Accuracy | 0.9322 |
| Macro Precision | 0.9216 |
| Macro Recall | 0.9216 |
| Macro F1 | 0.9216 |
| Test Loss | 0.4025 |
Per‑Class Performance
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Negative | 0.8649 | 0.8649 | 0.8649 |
| Neutral | 0.95 | 0.95 | 0.95 |
| Positive | 0.95 | 0.95 | 0.95 |
These results show strong generalization, excellent balance across classes, and robust performance on the hardest class (negative).
Training Data
The model was trained using a multilingual dataset combining:
- Zambian English
- Bemba
- Nyanja
- Code‑switched text
- Social media‑style expressions
- Local idioms and sentiment cues
Why LusakaLang Performs Better
1. Understanding Zambian English Nuances
Examples:
- “I’m just there” → Neutral
- “I’m not fine but I’m okay” → Neutral
- “I’m feeling somehow” → Neutral
- “Believe you me” → Neutral
- “It’s fine” → Negative (Zambian tone)
2. Handling Bemba/Nyanja Idioms
Examples:
- “Nimvela bwino” → Positive
- “Nimvelako bwino but…” → Neutral
- “Nima one boi” → Negative
- “Niba kalijo baja naiwe” → Negative
3. Code‑Switching Awareness
The model handles:
- English + Bemba
- English + Nyanja
- English + slang
- Mixed 3‑language expressions
4. Sarcasm Detection (Zambian Style)
Examples:
- “Wow, great service” → Negative
- “Nice, just what I needed” → Negative
- “Perfect timing!” → Negative
Bias, Risks, and Limitations
- Optimized for Zambia; may not generalize to other African regions.
- Sarcasm and indirect expressions can still be ambiguous.
- Not suitable for high‑risk decision‑making without human review.
- Best for short conversational text, not long documents.
How to Use This Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_ckpt = "Kelvinmbewe/LusakaLang"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt)
from transformers import pipeline
classifier = pipeline("text-classification", model="Kelvinmbewe/LusakaLang")
classifier("Driver was very professional and polite.")
def label_text(text):
result = classifier(text)[0]
sentiment = result['label'].lower()
mapping = {"negative": 0, "neutral": 1, "positive": 2}
return mapping[sentiment], sentiment
print(label_text("Umufyashi ailetelela bwino no mutende.")) # Bemba
print(label_text("Galimoto inachedwa koma woyendetsa anali wabwino.")) # Nyanja
print(label_text("The ride was okay, but the driver was over speeding.")) # English
- Downloads last month
- -
Model tree for Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis
Base model
google-bert/bert-base-multilingual-casedDatasets used to train Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis
Evaluation results
- accuracy on LusakaLang Training Datatest set self-reported0.932
- f1_macro on LusakaLang Training Datatest set self-reported0.922
- f1_negative on LusakaLang Training Datatest set self-reported0.865
- f1_neutral on LusakaLang Training Datatest set self-reported0.950
- f1_positive on LusakaLang Training Datatest set self-reported0.950
- test_loss on LusakaLang Training Datatest set self-reported0.403




