Multilingual Sentiment Classification Model (3 Languages)
Model Description
LusakaLang is a fine-tuned version of bert-base-multilingual-cased designed for multilingual sentiment analysis. It leverages diverse datasets to deliver robust performance across multiple languages and cultural contexts.
LusakaLang—short for _Lusaka Languages_—is a multilingual transformer model optimized for sentiment analysis in Zambian linguistic environments. The model is trained to understand Zambian English, particularly the variety spoken in Lusaka, where English is often blended with Bemba and Nyanja. It captures the unique way these languages mix in everyday speech and social media posts, enabling accurate sentiment detection in real-world, culturally nuanced text.
Training data
Advantages of LusakaLang Model
Unlike generic multilingual models, LusakaLang incorporates cultural and contextual nuances, improving accuracy in sentiment analysis for Zambia’s diverse language landscape. So whether it’s social media, customer feedback or informal text, LusakaLang delivers reliable sentiment predictions in mixed-language environments and does better in the following.
Better Understanding of Zambian English
Expressions like:
- “I’m just there”
- “I’m not fine but I’m okay”
- “I’m feeling somehow”
- "Believe you me"
- "Me I tell you the truth"
- "Its just temporal"
Better Handling of Bemba/Nyanja idioms
Examples:
- “Nimvela bwino” → positive
- “Nimvelako bwino pangono pangono but lelo not sure mwandi” → neutral
- “Nima one boi" → negative
- "Nima one naiwe" → negative
- "Sima one naiwe" → positive
Better handling of code‑switching
Socila Media comments often mix:
- English + Bemba
- English + Nyanja
- English + slang
- English + Bemba + Nyanja
Note: Lusaka has a distinct expression of English, Bemba and Nyanja which is significantly different from other provinces of Zambia, and this linguistic uniqueness is central to LusakaLang’s design.
Accuracy
Use Cases
Handle code‑switching explicitly
Mixed‑language comments often contain sentiment cues in one language and neutral filler in another.
Example:
“I am coming naiwe, ndefika nombaline I’m tired”
→ English part is neutral
→ Bemba part is positive
→ Combined meaning is neutral
Rule If sentiment cues conflict across languages:
- positive + negative → neutral
- positive + neutral → positive
- negative + neutral → negative
This prevents over‑confident misclassification.
Detect “Zambian English Neutralizers”
Certain expressions sound negative in English but are neutral or ambiguous in Zambian usage. Examples:
- “I’m just there” → Neutral (often means “I’m okay, nothing special happening”)
- “I’m okay I guess” → Neutral (not necessarily negative, often casual)
- “I’m feeling somehow” → Neutral (expresses uncertainty, not sadness)
- “It’s fine” → Negative (commonly used to imply dissatisfaction or passive disagreement)
- “It’s okay, but…” → Negative (signals hidden disapproval or reservation)
- “We’ll see” → Neutral (often means “maybe later,” not indecisive)
- “No worries” → Neutral (usually genuine reassurance, not dismissive)
- “It’s not bad” → Neutral or Positive (often means “it’s good enough”)
- “I’m managing” → Neutral (indicates coping, not necessarily struggling)
Rule If the text contains a recognized neutralizer phrase, override the model’s predicted polarity to neutral, unless strong negative cues appear in the surrounding context.
Why this matters:
Standard sentiment models often misclassify these phrases as negative because they resemble negative English expressions. LusakaLang applies this rule to ensure culturally accurate sentiment interpretation.
Handle indirect negativity (very common in Zambia)
In Zambian communication style, people rarely express emotions directly, instead of saying “I’m angry”, they'll use softer, indirect phrases that imply dissatisfaction.
Examples:
- “I’m not happy” → Negative
- “I’m not fine” → Negative
- “I’m not okay” → Negative
- “I’m not feeling myself” → Negative
- “I’m not sure how I feel” → Neutral
Rule: If the text contains negated emotion verbs, treat the sentiment as negative, unless the verb is ambiguous (e.g., “sure,” “certain,” “myself”), in which case classify as neutral.
Handle Bemba/Nyanja idioms
Some phrases have fixed sentiment polarity:
Bemba
- “Ndeumfwa bwino” → positive
- “Ndeumfwako bwino but…” → neutral
- “Ndeumfwako bwino lelo” → positive
- “Ndeumfwako bwino lelo but…” → neutral
- “Ndeumfwa bwino sana” → positive
- “Ukumfwako bwino te sana” → negative
- “Ndeumfwa bwino but I’m tired” → neutral
- “Ndeumfwa bwino but I’m not happy” → negative
- “Ndeumfwa bwino but…” → neutral
Nyanja
Nyanja expressions often carry strong sentiment cues that override the literal meaning of individual words. LusakaLang is trained to detect these idioms and adjust sentiment accordingly.
- “Nimvela bwino” → positive
- “Nimvelako bwino but…” → neutral
- “Nimvela Ma one ” → negative
- “Ine nili che” → positive
- “Ine nili che so so” → neutral
- “Niba kalijo baja naiwe” → negative
Rule : If a strong idiom appears, override the model’s prediction based on the idiom’s cultural meaning, regardless of surrounding context.
Handle Zambian Sarcasm
Sarcasm in Zambian English often uses positive words or polite phrases to express frustration, annoyance, or criticism. Standard sentiment models misclassify these as positive because they rely on literal word meaning. LusakaLang applies contextual and tone-aware rules to detect sarcasm accurately.
Examples: Examples:
- “Wow, great service ” → Negative (sarcastic tone, emoji reinforces negativity)
- “Nice, just what I needed” (after a complaint) → Negative (context flips meaning)
- “Thanks, I guess” → Neutral or Negative (depends on preceding context)
- “Good for you” → Negative in Zambian tone (often dismissive or mocking)
- “Perfect timing!” (after a delay) → Negative (sarcasm about lateness)
- “Lovely, just lovely” (after bad news) → Negative (tone overrides literal meaning)
Rule : If a positive sentiment words appear alongside a negative context cues or sarcasm then override polarity to negative or neutral based on context strength.
Bias, Risks and Limitations
- The dataset reflects Zambian linguistic patterns, so the model may not generalize well to other regions or dialects.
- Mixed-language slang and code-switching may introduce linguistic bias, especially in informal contexts.
- Sentiment labels rely on human annotation, which can include subjective interpretations and cultural assumptions.
- The model may misinterpret sarcasm, idioms, or culturally specific expressions, particularly in ambiguous cases.
- Not suitable for high-risk applications (e.g., healthcare, legal decisions) without human review.
- Performance may degrade on long documents or highly formal text, as the model is optimized for short, conversational content.
How to Use This Model
!pip install transformers torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_ckpt = "Kelvinmbewe/LusakaLang"
bert_tokenizer = AutoTokenizer.from_pretrained(model_ckpt )
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt )
Multilingual Examples
Adjust this to match your training labels:
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Kelvinmbewe/LusakaLang",
return_all_scores=False
)
def label_text(text):
result = classifier(text)[0]
sentiment = result['label'].lower()
mapping = {
"negative": 0,
"neutral": 1,
"positive": 2
}
return mapping[sentiment], sentiment
# Examples across languages
print(label_text("Umufyashi ailetelela bwino no mutende.")) # Bemba
print(label_text("Galimoto inachedwa koma woyendetsa anali wabwino.")) # Nyanja
print(label_text("The ride was okay, but the driver was over speeding.")) # English
This will output the following structure:
text,label,label_name
Ndimvela bwino lelo,2,positive
Ndikumva chisoni,0,negative
I am not happy with this service,0,negative
This is okay I guess,1,neutral
Ndikumva bwino koma sindikudziwa,1,neutral
I’m feeling great today,2,positive
Ndimvela bwino but I’m tired,1,neutral
I don’t like how this turned out,0,negative
Ndikumva bwino kwambiri lelo,2,positive
I’m just there,1,neutral
Using the HuggingFace Pipeline
from transformers import pipeline
classifier = pipeline("text-classification", model="Kelvinmbewe/LusakaLang")
classifier("Driver was very professional and polite.")
Nyanja Sources
- Chichewa Speech2Text Dataset
https://github.com/dmatekenya/Chichewa-Speech2Text - English–Chichewa Sentence Pairs (MT560)
https://huggingface.co/datasets/michsethowusu/english-chichewa_sentence-pairs_mt560 - Masakhane EN–NYA JW300 Benchmark
https://github.com/masakhane-io/masakhane-mt/blob/master/benchmarks/en-nya/jw-300-baseline/en_nya_starter_notebook.ipynb
Bemba Sources
- Code‑170k‑Bemba
https://huggingface.co/datasets/michsethowusu/Code-170k-bemba - BEMBA_big_c
https://huggingface.co/datasets/Beijuka/BEMBA_big_c
English Sources
- English–Chichewa Sentence Pairs (MT560)
https://huggingface.co/datasets/michsethowusu/english-chichewa_sentence-pairs_mt560
These datasets were used to enhance multilingual understanding, improve cross‑lingual transfer, and ensure the model performs well on Zambia‑specific linguistic structures. By leveraging diverse sources, LusakaLang captures unique patterns of Zambian English, Bemba and Nyanja, including code‑switching and culturally nuanced expressions common in Lusaka and other urban contexts.
- Downloads last month
- 101
Model tree for Kelvinmbewe/LusakaLang
Base model
google-bert/bert-base-multilingual-casedDatasets used to train Kelvinmbewe/LusakaLang
Evaluation results
- accuracy on LusakaLang Training Datatest set self-reported0.990
- f1 on LusakaLang Training Datatest set self-reported0.990

