File size: 5,151 Bytes
c3b8b32
cd41807
 
 
 
 
 
 
 
 
 
 
 
 
 
c3b8b32
cd41807
 
 
 
 
 
bcde4e7
cd41807
 
 
 
 
bcde4e7
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
cd41807
bcde4e7
 
 
 
e49f422
 
 
 
c3b8b32
 
64df334
ff8c28a
64df334
c3b8b32
64df334
 
 
c3b8b32
832c90b
c3b8b32
64df334
 
 
c3b8b32
832c90b
 
c3b8b32
cd41807
c3b8b32
832c90b
c3b8b32
832c90b
 
 
c3b8b32
832c90b
 
 
 
c3b8b32
 
cd41807
c3b8b32
832c90b
c3b8b32
 
cd41807
 
 
c3b8b32
3ba75a6
64df334
 
 
 
 
 
 
 
3ba75a6
64df334
 
 
 
 
cf76437
5d272c7
3ba75a6
cf76437
64df334
5d272c7
3ba75a6
5d272c7
 
 
3ba75a6
5d272c7
3ba75a6
5d272c7
 
 
3ba75a6
5d272c7
3ba75a6
5d272c7
 
 
3ba75a6
 
e49f422
 
703ab26
64df334
a405f26
64df334
a405f26
64df334
a405f26
 
 
64df334
a405f26
 
 
64df334
a405f26
 
 
e49f422
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
language:
- en
- bem
- ny
tags:
- multi-task
- sentiment-analysis
- topic-classification
- language-identification
- multilingual
- transformer
- zambia
- lusaka
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: LusakaLang-MultiTask
  results:
  - task:
      type: text-classification
      name: Language Identification
    dataset:
      name: LusakaLang Language Data
      type: lusakalang
      split: test
    metrics:
    - type: accuracy
      value: 0.97
      name: accuracy
    - type: f1
      value: 0.96
      name: f1_macro
    - type: accuracy
      value: 0.9322
      name: accuracy
    - type: f1
      value: 0.9216
      name: f1_macro
    - type: f1
      value: 0.8649
      name: f1_negative
    - type: f1
      value: 0.95
      name: f1_neutral
    - type: f1
      value: 0.95
      name: f1_positive
    - type: accuracy
      value: 0.91
      name: accuracy
    - type: f1
      value: 0.9
      name: f1_macro
base_model:
- Kelvinmbewe/mbert_Lusaka_Language_Analysis
- Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis
- Kelvinmbewe/mbert_LusakaLang_Topic
---

## **LusakaLang MultiTask Model**

This model is a unified transformer architecture built on top of `bert-base-multilingual-cased`, designed to perform three tasks simultaneously:

1. Language Identification
2. Sentiment Analysis
3. Topic Classification

The system integrates three fineโ€‘tuned LusakaLang checkpoints:

- mbert_Lusaka_Language_Analysis
- mbert_LusakaLang_Sentiment_Analysis
- mbert_LusakaLang_Topic

All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead 
and promotes consistent, harmonized predictions across all tasks.

---

## **Why This Model Matters**

Zambian communication is inherently multilingual, fluid, and deeply shaped by context. A single message may blend English, Bemba, Nyanja, local slang, 
and frequent codeโ€‘switching, often expressed through culturally grounded idioms and subtle emotional cues. This model is designed specifically for that
environment, where meaning depends not only on the words used but on how languages interact within a single utterance.

It excels at identifying the dominant language or detecting when multiple languages are being used together, interpreting sentiment even when it 
is conveyed indirectly or through culturally specific phrasing, and classifying text into practical topics such as driver behaviour, payment issues,
app performance, customer support, and ride availability. By capturing these nuances, the model provides a more accurate and contextโ€‘aware 
understanding of real Zambian communication.


---

## **How to Use This Model**


```python
from transformers import AutoTokenizer
import torch

class LusakaLangMultiTask:
    def __init__(self, path="Kelvinmbewe/LusakaLang-MultiTask"):
        self.tokenizer = AutoTokenizer.from_pretrained(path)
        self.model = torch.load(f"{path}/model.pt").eval()

    def predict_language(self, texts): pass
    def predict_sentiment(self, texts): pass
    def predict_topic(self, texts): pass

llm = LusakaLangMultiTask()

print(llm.predict_language([...]))
print(llm.predict_sentiment([...]))
print(llm.predict_topic([...]))

```

## Sample Output

```python
# Language Identification ๐ŸŒ
[
  {"lang": "Bemba",  "conf": 0.96},
  {"lang": "Nyanja", "conf": 0.95},
  {"lang": "English","conf": 0.99}
]
# Sentiment โค๏ธ
[
  {"sent": "Negative", "conf": 0.98},
  {"sent": "Positive", "conf": 0.95},
  {"sent": "Neutral",  "conf": 0.87}
]
# Topic ๐Ÿ—‚๏ธ
[
  {"topic": "Payment Issue",     "conf": 0.97},
  {"topic": "Customer Support",  "conf": 0.95},
  {"topic": "Driver Behaviour",  "conf": 0.96}
]
```


```
=========================== Training Architecture ===========================

๐Ÿ“ฅ Input                โ†’  ๐Ÿง  Core Engine              โ†’            ๐Ÿ“ˆ Output
------------------------------------------------------------------------------------
Text (Any Language)     โ†’   Tokenizer ๐Ÿ”ค                       โ†’     Language ๐ŸŒ
                        โ†’   Shared mBERT Encoder ๐Ÿง             โ†’     Bemba / Nyanja /
                        โ†’   CLS Vector ๐ŸŽฏ                      โ†’     English / Mixed
------------------------------------------------------------------------------------
User Feedback ๐Ÿ’ฌ        โ†’   Tokenizer ๐Ÿ”ค                       โ†’     Sentiment โค๏ธ
                        โ†’   Shared Encoder ๐Ÿง                   โ†’     Negative / Neutral /
                        โ†’   CLS Vector ๐ŸŽฏ                      โ†’     Positive
------------------------------------------------------------------------------------
Ride Context ๐Ÿš—         โ†’   Tokenizer ๐Ÿ”ค                       โ†’     Topic ๐Ÿ—‚๏ธ
                        โ†’   Shared Encoder ๐Ÿง                   โ†’     Driver / Payment /
                        โ†’   CLS Vector ๐ŸŽฏ                      โ†’     Support / App / Availability
------------------------------------------------------------------------------------
```