File size: 2,818 Bytes
a619401 89824f3 a619401 89824f3 bfe1017 c20fc13 bfe1017 89824f3 bfe1017 89824f3 bfe1017 e811e6d bfe1017 e811e6d bfe1017 e811e6d bfe1017 e811e6d bfe1017 e811e6d bfe1017 e811e6d bfe1017 a619401 c20fc13 a619401 7e95745 e811e6d a619401 c20fc13 a619401 21c2ceb c20fc13 e811e6d c20fc13 e811e6d c20fc13 21c2ceb a619401 21c2ceb e811e6d c20fc13 a619401 e811e6d 21c2ceb a619401 c20fc13 21c2ceb c20fc13 e811e6d a619401 c20fc13 e811e6d c20fc13 a619401 c20fc13 21c2ceb a619401 1a61a48 e811e6d a619401 1a61a48 e811e6d 60317aa 1a61a48 e811e6d a619401 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | ---
language:
- en
- ny
- bem
tags:
- text-classification
- multilingual
- transformer
- zambia
- lusaka
- code-switching
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
base_model:
- Kelvinmbewe/mbert_Lusaka_Language_Analysis
- google-bert/bert-base-multilingual-cased
metrics:
- accuracy
- precision
- recall
- macro_f1
- micro_f1
- validation_loss
- confusion_matrix
model-index:
- name: LusakaLang
results:
- task:
type: text-classification
name: Topic Classification
dataset:
name: LusakaLang Topic Dataset
type: lusakalang
config: default
split: validation
metrics:
- type: accuracy
value: 0.99259
name: accuracy
- type: precision
value: 0.98730
name: precision
- type: recall
value: 0.99128
name: recall
- type: f1
value: 0.98926
name: macro_f1
- type: f1
value: 0.99259
name: micro_f1
- type: loss
value: 0.05233
name: validation_loss
---
# **LusakaLang Topic Analysis Model**
This model was trained using its sister model, `mbert_LusakaLang_Sentiment_Analysis`, which was fine‑tuned on sentiment data
spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication.
## Training Details
```python
- Base model: `mbert_LusakaLang_Sentiment_Analysis`
- Epochs: 20
- Class weights: enabled (to correct class imbalance)
- Optimizer: AdamW
- Loss: Weighted cross‑entropy
- Temperature scaling: T = 2.3 (applied at inference time)
```
## **Why Temperature Scaling?**
```python
Class‑weighted training sharpens logits.
Temperature scaling at T = 2.3 improves:
- Confidence calibration
- Noise robustness
- Handling of positive/neutral text
- Foreign‑language generalization
- Reduction of overconfident misclassifications
```
## Training Data
```python
The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia.
To ensure authenticity:
- All samples were reviewed by a native Zambian speaker
- Mixed langauge and slang patterns were corrected
- Local idioms and slang were added
- Unnatural AI‑generated phrasing was removed
- Bemba/Nyanja grammars and tone were validated
This hybrid approach ensures tha the dataset reflects real Zambian communication style.
```
## Train and Validation Loss

## Confusion Matrix

## Word Cloud

|