|
|
--- |
|
|
language: |
|
|
- en |
|
|
- ny |
|
|
- bem |
|
|
tags: |
|
|
- sentiment-analysis |
|
|
- multilingual |
|
|
- transformer |
|
|
- zambia |
|
|
- lusaka |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
base_model: |
|
|
- google-bert/bert-base-multilingual-cased |
|
|
datasets: |
|
|
- michsethowusu/english-chichewa_sentence-pairs_mt560 |
|
|
- michsethowusu/Code-170k-bemba |
|
|
- Beijuka/BEMBA_big_c |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
- confusion_matrix |
|
|
- validation_loss |
|
|
model-index: |
|
|
- name: LusakaLang |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Sentiment Analysis |
|
|
dataset: |
|
|
name: LusakaLang Test Set |
|
|
type: lusakalang |
|
|
config: default |
|
|
split: test |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.9973 |
|
|
name: accuracy |
|
|
- type: precision |
|
|
value: 0.9973 |
|
|
name: precision |
|
|
- type: recall |
|
|
value: 0.9973 |
|
|
name: recall |
|
|
- type: f1 |
|
|
value: 0.9978 |
|
|
name: f1 |
|
|
--- |
|
|
|
|
|
|
|
|
## **Lusaka Language Analysis Model** |
|
|
|
|
|
The Lusaka Language Analysis is a multilingual sentiment classification model fine‑tuned from `google-bert/bert-base-multilingual-cased (mBERT)`. |
|
|
and it is built specifically for Zambian linguistic contexts with a focus on: |
|
|
- Zambian English (Lusaka variety) |
|
|
- Bemba |
|
|
- Nyanja (Chichewa) |
|
|
|
|
|
The model is optimized to recognize mixed-language usage, local idioms, indirect expressions, and contextual sarcasm commonly found in everyday |
|
|
Zambian communication and social media discourse. |
|
|
|
|
|
--- |
|
|
|
|
|
## Task |
|
|
```python |
|
|
def classify_text(text): |
|
|
""" |
|
|
Run inference on a single text input using the fine‑tuned LusakaLang model. |
|
|
Returns the predicted label and confidence score. |
|
|
""" |
|
|
result = classifier(text)[0] |
|
|
label = result["label"] |
|
|
score = round(result["score"], 4) |
|
|
return label, score |
|
|
samples = [ |
|
|
"Muli shani bane, nalishiba bwino.", |
|
|
"How are you doing today?", |
|
|
"Tili bwino, zikomo kwambiri." |
|
|
] |
|
|
for s in samples: |
|
|
label, score = classify_text(s) |
|
|
print(f"Text: {s}\nPrediction: {label} (confidence={score})\n") |
|
|
``` |
|
|
|
|
|
## Sample Output |
|
|
|
|
|
```python |
|
|
Text: Muli shani bane, nalishiba bwino. |
|
|
Prediction: Bemba (confidence=0.9821) |
|
|
|
|
|
Text: How are you doing today? |
|
|
Prediction: English (confidence=0.9954) |
|
|
|
|
|
Text: Tili bwino, zikomo kwambiri. |
|
|
Prediction: Nyanja (confidence=0.9736) |
|
|
``` |
|
|
--- |
|
|
|
|
|
## Language Graph |
|
|
 |
|
|
> Note: The unknown langauge here represents a Mixed language of English, Bemba and Nyanja of varying degrees e.g GPS yenze nama issues so it made me delay my journey kwati nibamudala. |
|
|
|
|
|
|
|
|
## Classification Report |
|
|
 |
|
|
|
|
|
## Confusion Matrix |
|
|
 |
|
|
|
|
|
## Word Cloud |
|
|
 |
|
|
|
|
|
|