Kelvinmbewe's picture
Update README.md
bf9f463 verified
metadata
language:
  - en
  - ny
  - bem
tags:
  - sentiment-analysis
  - multilingual
  - transformer
  - zambia
  - lusaka
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
base_model:
  - google-bert/bert-base-multilingual-cased
datasets:
  - michsethowusu/english-chichewa_sentence-pairs_mt560
  - michsethowusu/Code-170k-bemba
  - Beijuka/BEMBA_big_c
metrics:
  - accuracy
  - precision
  - recall
  - f1
  - confusion_matrix
  - validation_loss
model-index:
  - name: LusakaLang
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          name: LusakaLang Test Set
          type: lusakalang
          config: default
          split: test
        metrics:
          - type: accuracy
            value: 0.9973
            name: accuracy
          - type: precision
            value: 0.9973
            name: precision
          - type: recall
            value: 0.9973
            name: recall
          - type: f1
            value: 0.9978
            name: f1

Lusaka Language Analysis Model

The Lusaka Language Analysis is a multilingual sentiment classification model fine‑tuned from google-bert/bert-base-multilingual-cased (mBERT). and it is built specifically for Zambian linguistic contexts with a focus on:

  • Zambian English (Lusaka variety)
  • Bemba
  • Nyanja (Chichewa)

The model is optimized to recognize mixed-language usage, local idioms, indirect expressions, and contextual sarcasm commonly found in everyday Zambian communication and social media discourse.


Task

def classify_text(text):
    """
    Run inference on a single text input using the fine‑tuned LusakaLang model.
    Returns the predicted label and confidence score.
    """
    result = classifier(text)[0]
    label = result["label"]
    score = round(result["score"], 4)
    return label, score
samples = [
    "Muli shani bane, nalishiba bwino.",
    "How are you doing today?",
    "Tili bwino, zikomo kwambiri."
]
for s in samples:
    label, score = classify_text(s)
    print(f"Text: {s}\nPrediction: {label} (confidence={score})\n")

Sample Output

Text: Muli shani bane, nalishiba bwino.
Prediction: Bemba (confidence=0.9821)

Text: How are you doing today?
Prediction: English (confidence=0.9954)

Text: Tili bwino, zikomo kwambiri.
Prediction: Nyanja (confidence=0.9736)

Language Graph

image

Note: The unknown langauge here represents a Mixed language of English, Bemba and Nyanja of varying degrees e.g GPS yenze nama issues so it made me delay my journey kwati nibamudala.

Classification Report

image

Confusion Matrix

image

Word Cloud

image