Multilingual Sentiment Classification Model (3 Languages)

Model Description

LusakaLang is a fine-tuned version of bert-base-multilingual-cased designed for multilingual sentiment analysis. It leverages diverse datasets to deliver robust performance across multiple languages and cultural contexts.

LusakaLang—short for _Lusaka Languages_—is a multilingual transformer model optimized for sentiment analysis in Zambian linguistic environments. The model is trained to understand Zambian English, particularly the variety spoken in Lusaka, where English is often blended with Bemba and Nyanja. It captures the unique way these languages mix in everyday speech and social media posts, enabling accurate sentiment detection in real-world, culturally nuanced text.

Training data

Advantages of LusakaLang Model

Unlike generic multilingual models, LusakaLang incorporates cultural and contextual nuances, improving accuracy in sentiment analysis for Zambia’s diverse language landscape. So whether it’s social media, customer feedback or informal text, LusakaLang delivers reliable sentiment predictions in mixed-language environments and does better in the following.

Better Understanding of Zambian English

Expressions like:

“I’m just there”
“I’m not fine but I’m okay”
“I’m feeling somehow”
"Believe you me"
"Me I tell you the truth"
"Its just temporal"

Better Handling of Bemba/Nyanja idioms

Examples:

“Nimvela bwino” → positive
“Nimvelako bwino pangono pangono but lelo not sure mwandi” → neutral
“Nima one boi" → negative
"Nima one naiwe" → negative
"Sima one naiwe" → positive

Better handling of code‑switching

Socila Media comments often mix:

English + Bemba
English + Nyanja
English + slang
English + Bemba + Nyanja

Note: Lusaka has a distinct expression of English, Bemba and Nyanja which is significantly different from other provinces of Zambia, and this linguistic uniqueness is central to LusakaLang’s design.

Accuracy

Use Cases

Handle code‑switching explicitly

Mixed‑language comments often contain sentiment cues in one language and neutral filler in another.

Example:
“I am coming naiwe, ndefika nombaline I’m tired”
→ English part is neutral
→ Bemba part is positive
→ Combined meaning is neutral

Rule If sentiment cues conflict across languages:

positive + negative → neutral

positive + neutral → positive

negative + neutral → negative

This prevents over‑confident misclassification.

Detect “Zambian English Neutralizers”

Certain expressions sound negative in English but are neutral or ambiguous in Zambian usage. Examples:

“I’m just there” → Neutral (often means “I’m okay, nothing special happening”)
“I’m okay I guess” → Neutral (not necessarily negative, often casual)
“I’m feeling somehow” → Neutral (expresses uncertainty, not sadness)
“It’s fine” → Negative (commonly used to imply dissatisfaction or passive disagreement)
“It’s okay, but…” → Negative (signals hidden disapproval or reservation)
“We’ll see” → Neutral (often means “maybe later,” not indecisive)
“No worries” → Neutral (usually genuine reassurance, not dismissive)
“It’s not bad” → Neutral or Positive (often means “it’s good enough”)
“I’m managing” → Neutral (indicates coping, not necessarily struggling)

Rule If the text contains a recognized neutralizer phrase, override the model’s predicted polarity to neutral, unless strong negative cues appear in the surrounding context.

Why this matters:
Standard sentiment models often misclassify these phrases as negative because they resemble negative English expressions. LusakaLang applies this rule to ensure culturally accurate sentiment interpretation.

Handle indirect negativity (very common in Zambia)

In Zambian communication style, people rarely express emotions directly, instead of saying “I’m angry”, they'll use softer, indirect phrases that imply dissatisfaction.

Examples:

“I’m not happy” → Negative
“I’m not fine” → Negative
“I’m not okay” → Negative
“I’m not feeling myself” → Negative
“I’m not sure how I feel” → Neutral

Rule: If the text contains negated emotion verbs, treat the sentiment as negative, unless the verb is ambiguous (e.g., “sure,” “certain,” “myself”), in which case classify as neutral.

Handle Bemba/Nyanja idioms

Some phrases have fixed sentiment polarity:

Bemba

“Ndeumfwa bwino” → positive
“Ndeumfwako bwino but…” → neutral
“Ndeumfwako bwino lelo” → positive
“Ndeumfwako bwino lelo but…” → neutral
“Ndeumfwa bwino sana” → positive
“Ukumfwako bwino te sana” → negative
“Ndeumfwa bwino but I’m tired” → neutral
“Ndeumfwa bwino but I’m not happy” → negative
“Ndeumfwa bwino but…” → neutral

Nyanja

Nyanja expressions often carry strong sentiment cues that override the literal meaning of individual words. LusakaLang is trained to detect these idioms and adjust sentiment accordingly.

“Nimvela bwino” → positive
“Nimvelako bwino but…” → neutral
“Nimvela Ma one ” → negative
“Ine nili che” → positive
“Ine nili che so so” → neutral
“Niba kalijo baja naiwe” → negative

Rule : If a strong idiom appears, override the model’s prediction based on the idiom’s cultural meaning, regardless of surrounding context.

Handle Zambian Sarcasm

Sarcasm in Zambian English often uses positive words or polite phrases to express frustration, annoyance, or criticism. Standard sentiment models misclassify these as positive because they rely on literal word meaning. LusakaLang applies contextual and tone-aware rules to detect sarcasm accurately.

Examples: Examples:

“Wow, great service ” → Negative (sarcastic tone, emoji reinforces negativity)
“Nice, just what I needed” (after a complaint) → Negative (context flips meaning)
“Thanks, I guess” → Neutral or Negative (depends on preceding context)
“Good for you” → Negative in Zambian tone (often dismissive or mocking)
“Perfect timing!” (after a delay) → Negative (sarcasm about lateness)
“Lovely, just lovely” (after bad news) → Negative (tone overrides literal meaning)

Rule : If a positive sentiment words appear alongside a negative context cues or sarcasm then override polarity to negative or neutral based on context strength.

Bias, Risks and Limitations

The dataset reflects Zambian linguistic patterns, so the model may not generalize well to other regions or dialects.
Mixed-language slang and code-switching may introduce linguistic bias, especially in informal contexts.
Sentiment labels rely on human annotation, which can include subjective interpretations and cultural assumptions.
The model may misinterpret sarcasm, idioms, or culturally specific expressions, particularly in ambiguous cases.
Not suitable for high-risk applications (e.g., healthcare, legal decisions) without human review.
Performance may degrade on long documents or highly formal text, as the model is optimized for short, conversational content.

How to Use This Model

!pip install transformers torch

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_ckpt = "Kelvinmbewe/LusakaLang"

bert_tokenizer = AutoTokenizer.from_pretrained(model_ckpt )
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt )

Multilingual Examples

Adjust this to match your training labels:

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Kelvinmbewe/LusakaLang",
    return_all_scores=False
)
def label_text(text):
    result = classifier(text)[0]
    sentiment = result['label'].lower()

    mapping = {
        "negative": 0,
        "neutral": 1,
        "positive": 2
    }
    return mapping[sentiment], sentiment

# Examples across languages
print(label_text("Umufyashi ailetelela bwino no mutende."))               # Bemba
print(label_text("Galimoto inachedwa koma woyendetsa anali wabwino."))    # Nyanja
print(label_text("The ride was okay, but the driver was over speeding.")) # English

This will output the following structure:

text,label,label_name
Ndimvela bwino lelo,2,positive
Ndikumva chisoni,0,negative
I am not happy with this service,0,negative
This is okay I guess,1,neutral
Ndikumva bwino koma sindikudziwa,1,neutral
I’m feeling great today,2,positive
Ndimvela bwino but I’m tired,1,neutral
I don’t like how this turned out,0,negative
Ndikumva bwino kwambiri lelo,2,positive
I’m just there,1,neutral

Using the HuggingFace Pipeline

from transformers import pipeline

classifier = pipeline("text-classification", model="Kelvinmbewe/LusakaLang")
classifier("Driver was very professional and polite.")

These datasets were used to enhance multilingual understanding, improve cross‑lingual transfer, and ensure the model performs well on Zambia‑specific linguistic structures. By leveraging diverse sources, LusakaLang captures unique patterns of Zambian English, Bemba and Nyanja, including code‑switching and culturally nuanced expressions common in Lusaka and other urban contexts.

Downloads last month: 101

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Kelvinmbewe/LusakaLang

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(923)

this model

Datasets used to train Kelvinmbewe/LusakaLang

Evaluation results

accuracy on LusakaLang Training Data
test set self-reported

0.990
f1 on LusakaLang Training Data
test set self-reported

0.990