File size: 3,010 Bytes

4d23010
a5fb927
 
 
 
 
 
 
 
 
 
 
4d23010
a5fb927
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e3cd22
a5fb927
 
 
 
 
 
8e3cd22
a5fb927
8e3cd22
 
a5fb927
8e3cd22
 
a5fb927
8e3cd22
 
a5fb927
8e3cd22
4d23010
 
 
bf9f463
4d23010
bf9f463
 
 
 
 
4d23010
bf9f463
 
4d23010
a5fb927
4d23010
a5fb927
91653ce
a5fb927
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91653ce
bf9f463
 
 
91653ce
 
 
a5fb927
91653ce
 
 
 
 
a5fb927
 
4d23010
bf9f463
5b0734e
b0a438b
4d23010
5b0734e
bf9f463
5b0734e
4d23010
bf9f463
5b0734e
4d23010
bf9f463
5b0734e
4d23010

---
language:
- en
- ny
- bem
tags:
- sentiment-analysis
- multilingual
- transformer
- zambia
- lusaka
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
base_model:
- google-bert/bert-base-multilingual-cased
datasets:
- michsethowusu/english-chichewa_sentence-pairs_mt560
- michsethowusu/Code-170k-bemba
- Beijuka/BEMBA_big_c
metrics:
- accuracy
- precision
- recall
- f1
- confusion_matrix
- validation_loss
model-index:
- name: LusakaLang
  results:
  - task:
      type: text-classification
      name: Sentiment Analysis
    dataset:
      name: LusakaLang Test Set
      type: lusakalang
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 0.9973
      name: accuracy
    - type: precision
      value: 0.9973
      name: precision
    - type: recall
      value: 0.9973
      name: recall
    - type: f1
      value: 0.9978
      name: f1
---


## **Lusaka Language Analysis Model**

The Lusaka Language Analysis is a multilingual sentiment classification model fine‑tuned from  `google-bert/bert-base-multilingual-cased (mBERT)`.
and it is built specifically for Zambian linguistic contexts with a focus on:
- Zambian English (Lusaka variety)  
- Bemba  
- Nyanja (Chichewa) 

The model is optimized to recognize mixed-language usage, local idioms, indirect expressions, and contextual sarcasm commonly found in everyday 
Zambian communication and social media discourse.

---

## Task
```python
def classify_text(text):
    """
    Run inference on a single text input using the fine‑tuned LusakaLang model.
    Returns the predicted label and confidence score.
    """
    result = classifier(text)[0]
    label = result["label"]
    score = round(result["score"], 4)
    return label, score
samples = [
    "Muli shani bane, nalishiba bwino.",
    "How are you doing today?",
    "Tili bwino, zikomo kwambiri."
]
for s in samples:
    label, score = classify_text(s)
    print(f"Text: {s}\nPrediction: {label} (confidence={score})\n")
```

## Sample Output

```python
Text: Muli shani bane, nalishiba bwino.
Prediction: Bemba (confidence=0.9821)

Text: How are you doing today?
Prediction: English (confidence=0.9954)

Text: Tili bwino, zikomo kwambiri.
Prediction: Nyanja (confidence=0.9736)
```
---

## Language Graph
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/OTroxtjtYgvijaMcv4Tpn.png)
> Note: The unknown langauge here represents a Mixed language of English, Bemba and Nyanja of varying degrees e.g GPS yenze nama issues so it made me delay my journey kwati nibamudala.


## Classification Report
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/v5eLxfxuKDJ7Sd8uX2P9s.png)

## Confusion Matrix
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/mxnDjRmAX-XLHzMfcWnfr.png)

## Word Cloud
![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/J-atqadjfCh7xUKRSRSnL.png)