Update README.md
Browse files
README.md
CHANGED
|
@@ -55,53 +55,36 @@ model-index:
|
|
| 55 |
name: f1_macro
|
| 56 |
---
|
| 57 |
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
- `Kelvinmbewe/mbert_Lusaka_Language_Analysis`
|
| 72 |
-
- `Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis`
|
| 73 |
-
- `Kelvinmbewe/mbert_LusakaLang_Topic`
|
| 74 |
-
|
| 75 |
-
All tasks share a **single mBERT encoder**, with **three independent classifier heads**.
|
| 76 |
-
This architecture improves efficiency, reduces memory footprint, and enables consistent predictions across tasks.
|
| 77 |
|
| 78 |
---
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
Zambian communication is multilingual, fluid, and highly context‑dependent.
|
| 83 |
-
A single message may include:
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
- Slang
|
| 89 |
-
- Code‑switching
|
| 90 |
-
- Cultural idioms
|
| 91 |
-
- Indirect emotional cues
|
| 92 |
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
-
It excels at:
|
| 96 |
-
|
| 97 |
-
- Identifying the **dominant language** or **code‑switching**
|
| 98 |
-
- Detecting **sentiment polarity** in culturally nuanced text
|
| 99 |
-
- Classifying **topics** such as:
|
| 100 |
-
- driver behaviour
|
| 101 |
-
- payment issues
|
| 102 |
-
- app performance
|
| 103 |
-
- customer support
|
| 104 |
-
- ride availability
|
| 105 |
|
| 106 |
---
|
| 107 |
|
|
@@ -118,34 +101,9 @@ This multi‑task setup improves generalization and reduces inference cost.
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
-
# **Performance Summary**
|
| 122 |
-
|
| 123 |
-
## **Language Identification**
|
| 124 |
-
| Metric | Score |
|
| 125 |
-
|--------|--------|
|
| 126 |
-
| Accuracy | 0.97 |
|
| 127 |
-
| Macro‑F1 | 0.96 |
|
| 128 |
-
|
| 129 |
-
## **Sentiment Analysis (Epoch 30 — Final Checkpoint)**
|
| 130 |
-
| Metric | Score |
|
| 131 |
-
|--------|--------|
|
| 132 |
-
| Accuracy | 0.9322 |
|
| 133 |
-
| Macro‑F1 | 0.9216 |
|
| 134 |
-
| Negative F1 | 0.8649 |
|
| 135 |
-
| Neutral F1 | 0.95 |
|
| 136 |
-
| Positive F1 | 0.95 |
|
| 137 |
-
|
| 138 |
-
## **Topic Classification**
|
| 139 |
-
| Metric | Score |
|
| 140 |
-
|--------|--------|
|
| 141 |
-
| Accuracy | 0.91 |
|
| 142 |
-
| Macro‑F1 | 0.90 |
|
| 143 |
|
| 144 |
-
|
| 145 |
|
| 146 |
-
# **How to Use This Model**
|
| 147 |
-
|
| 148 |
-
## **Load the Multi‑Task Model**
|
| 149 |
|
| 150 |
```python
|
| 151 |
from transformers import AutoTokenizer
|
|
@@ -182,15 +140,6 @@ predict_topic([
|
|
| 182 |
```
|
| 183 |
|
| 184 |
|
| 185 |
-
```python
|
| 186 |
-
@model{LusakaLangMultiTask,
|
| 187 |
-
author = {Kelvin Mbewe},
|
| 188 |
-
title = {LusakaLang Multi-Task Model},
|
| 189 |
-
year = 2025,
|
| 190 |
-
url = {https://huggingface.co/Kelvinmbewe/LusakaLang-MultiTask}
|
| 191 |
-
}
|
| 192 |
-
```
|
| 193 |
-
|
| 194 |
|
| 195 |
```python
|
| 196 |
|
|
|
|
| 55 |
name: f1_macro
|
| 56 |
---
|
| 57 |
|
| 58 |
+
## **LusakaLang Multi‑Task Model (Language + Sentiment + Topic)**
|
| 59 |
|
| 60 |
+
This model is a unified transformer architecture built on top of **`bert-base-multilingual-cased`**, designed to perform **three tasks simultaneously**:
|
| 61 |
|
| 62 |
+
1. **[Language Identification](guide://action?prefill=Tell%20me%20more%20about%3A%20Language%20Identification)**
|
| 63 |
+
2. **[Sentiment Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Sentiment%20Analysis)**
|
| 64 |
+
3. **[Topic Classification](guide://action?prefill=Tell%20me%20more%20about%3A%20Topic%20Classification)**
|
| 65 |
|
| 66 |
+
The system integrates three fine‑tuned LusakaLang checkpoints:
|
| 67 |
|
| 68 |
+
- **[Kelvinmbewe/mbert_Lusaka_Language_Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_Lusaka_Language_Analysis)**
|
| 69 |
+
- **[Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_LusakaLang_Sentiment_Analysis)**
|
| 70 |
+
- **[Kelvinmbewe/mbert_LusakaLang_Topic](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_LusakaLang_Topic)**
|
| 71 |
|
| 72 |
+
All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead
|
| 73 |
+
and promotes consistent, harmonized predictions across all tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
---
|
| 76 |
|
| 77 |
+
## **Why This Model Matters**
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
Zambian communication is inherently multilingual, fluid, and deeply shaped by context. A single message may blend English, Bemba, Nyanja, local slang,
|
| 80 |
+
and frequent code‑switching, often expressed through culturally grounded idioms and subtle emotional cues. This model is designed specifically for that
|
| 81 |
+
environment, where meaning depends not only on the words used but on how languages interact within a single utterance.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
It excels at identifying the dominant language or detecting when multiple languages are being used together, interpreting sentiment even when it
|
| 84 |
+
is conveyed indirectly or through culturally specific phrasing, and classifying text into practical topics such as driver behaviour, payment issues,
|
| 85 |
+
app performance, customer support, and ride availability. By capturing these nuances, the model provides a more accurate and context‑aware
|
| 86 |
+
understanding of real Zambian communication.
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
---
|
| 90 |
|
|
|
|
| 101 |
|
| 102 |
---
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
## **How to Use This Model**
|
| 106 |
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
```python
|
| 109 |
from transformers import AutoTokenizer
|
|
|
|
| 140 |
```
|
| 141 |
|
| 142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
|
| 144 |
```python
|
| 145 |
|