Kelvinmbewe
/

mbert_Lusaka_Language_Analysis

Text Classification

sentiment-analysis

Eval Results (legacy)

Model card Files Files and versions

mbert_Lusaka_Language_Analysis / README.md

Kelvinmbewe's picture

Update README.md

bf9f463 verified 8 days ago

|

history blame contribute delete

3.01 kB

	---
	language:
	- en
	- ny
	- bem
	tags:
	- sentiment-analysis
	- multilingual
	- transformer
	- zambia
	- lusaka
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-classification
	base_model:
	- google-bert/bert-base-multilingual-cased
	datasets:
	- michsethowusu/english-chichewa_sentence-pairs_mt560
	- michsethowusu/Code-170k-bemba
	- Beijuka/BEMBA_big_c
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	- confusion_matrix
	- validation_loss
	model-index:
	- name: LusakaLang
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	dataset:
	name: LusakaLang Test Set
	type: lusakalang
	config: default
	split: test
	metrics:
	- type: accuracy
	value: 0.9973
	name: accuracy
	- type: precision
	value: 0.9973
	name: precision
	- type: recall
	value: 0.9973
	name: recall
	- type: f1
	value: 0.9978
	name: f1
	---


	## Lusaka Language Analysis Model

	The Lusaka Language Analysis is a multilingual sentiment classification model fine‑tuned from `google-bert/bert-base-multilingual-cased (mBERT)`.
	and it is built specifically for Zambian linguistic contexts with a focus on:
	- Zambian English (Lusaka variety)
	- Bemba
	- Nyanja (Chichewa)

	The model is optimized to recognize mixed-language usage, local idioms, indirect expressions, and contextual sarcasm commonly found in everyday
	Zambian communication and social media discourse.

	---

	## Task
	```python
	def classify_text(text):
	"""
	Run inference on a single text input using the fine‑tuned LusakaLang model.
	Returns the predicted label and confidence score.
	"""
	result = classifier(text)[0]
	label = result["label"]
	score = round(result["score"], 4)
	return label, score
	samples = [
	"Muli shani bane, nalishiba bwino.",
	"How are you doing today?",
	"Tili bwino, zikomo kwambiri."
	]
	for s in samples:
	label, score = classify_text(s)
	print(f"Text: {s}\nPrediction: {label} (confidence={score})\n")
	```

	## Sample Output

	```python
	Text: Muli shani bane, nalishiba bwino.
	Prediction: Bemba (confidence=0.9821)

	Text: How are you doing today?
	Prediction: English (confidence=0.9954)

	Text: Tili bwino, zikomo kwambiri.
	Prediction: Nyanja (confidence=0.9736)
	```
	---

	## Language Graph
	![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/OTroxtjtYgvijaMcv4Tpn.png)
	> Note: The unknown langauge here represents a Mixed language of English, Bemba and Nyanja of varying degrees e.g GPS yenze nama issues so it made me delay my journey kwati nibamudala.


	## Classification Report
	![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/v5eLxfxuKDJ7Sd8uX2P9s.png)

	## Confusion Matrix
	![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/mxnDjRmAX-XLHzMfcWnfr.png)

	## Word Cloud
	![image](https://cdn-uploads.huggingface.co/production/uploads/674ed988f86d2ca07fa23abe/J-atqadjfCh7xUKRSRSnL.png)