Kelvinmbewe commited on
Commit
64df334
Β·
verified Β·
1 Parent(s): 5d272c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -65
README.md CHANGED
@@ -59,19 +59,19 @@ base_model:
59
  - Kelvinmbewe/mbert_LusakaLang_Topic
60
  ---
61
 
62
- ## **LusakaLang Multi‑Task Model (Language + Sentiment + Topic)**
63
 
64
- This model is a unified transformer architecture built on top of **`bert-base-multilingual-cased`**, designed to perform **three tasks simultaneously**:
65
 
66
- 1. **[Language Identification](guide://action?prefill=Tell%20me%20more%20about%3A%20Language%20Identification)**
67
- 2. **[Sentiment Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Sentiment%20Analysis)**
68
- 3. **[Topic Classification](guide://action?prefill=Tell%20me%20more%20about%3A%20Topic%20Classification)**
69
 
70
  The system integrates three fine‑tuned LusakaLang checkpoints:
71
 
72
- - **[Kelvinmbewe/mbert_Lusaka_Language_Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_Lusaka_Language_Analysis)**
73
- - **[Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_LusakaLang_Sentiment_Analysis)**
74
- - **[Kelvinmbewe/mbert_LusakaLang_Topic](guide://action?prefill=Tell%20me%20more%20about%3A%20Kelvinmbewe%2Fmbert_LusakaLang_Topic)**
75
 
76
  All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead
77
  and promotes consistent, harmonized predictions across all tasks.
@@ -92,20 +92,6 @@ understanding of real Zambian communication.
92
 
93
  ---
94
 
95
- # **Training Architecture**
96
-
97
- The model uses:
98
-
99
- - **Shared Encoder:** mBERT
100
- - **Head 1:** Language classifier
101
- - **Head 2:** Sentiment classifier
102
- - **Head 3:** Topic classifier
103
-
104
- This multi‑task setup improves generalization and reduces inference cost.
105
-
106
- ---
107
-
108
-
109
  ## **How to Use This Model**
110
 
111
 
@@ -114,46 +100,25 @@ from transformers import AutoTokenizer
114
  import torch
115
 
116
  class LusakaLangMultiTask:
117
- def __init__(self, model_path="Kelvinmbewe/LusakaLang-MultiTask"):
118
- self.tokenizer = AutoTokenizer.from_pretrained(model_path)
119
- self.model = torch.load(f"{model_path}/model.pt")
120
- self.model.eval()
121
- def predict_language(self, texts):
122
- # Your actual implementation goes here
123
- pass
124
- def predict_sentiment(self, texts):
125
- # Your actual implementation goes here
126
- pass
127
- def predict_topic(self, texts):
128
- # Your actual implementation goes here
129
- pass
130
-
131
- # Instantiate model
132
  llm = LusakaLangMultiTask()
133
- # Run predictions
134
- language_results = llm.predict_language([
135
- "Ndeumfwa bwino lelo",
136
- "Galimoto inachedwa koma driver anali bwino",
137
- "The service was terrible today"
138
- ])
139
- sentiment_results = llm.predict_sentiment([
140
- "Driver was rude and unprofessional",
141
- "Ndimvela bwino lelo",
142
- "The ride was okay, nothing special"
143
- ])
144
- topic_results = llm.predict_topic([
145
- "Payment failed but money was deducted",
146
- "Support siyankhapo, waited long",
147
- "Driver was over speeding"
148
- ])
149
- print(language_results)
150
- print(sentiment_results)
151
- print(topic_results)
152
  ```
153
 
154
  ## Sample Output
155
 
156
- ```ansi
157
  # Language Identification 🌍
158
  [
159
  {"lang": "Bemba", "conf": 0.96},
@@ -175,24 +140,21 @@ print(topic_results)
175
  ```
176
 
177
 
178
-
179
-
180
  ```
181
- =========================== MULTI‑TASK PIPELINE ===========================
182
 
183
- πŸ“₯ Input β†’ 🧠 Core Engine β†’ πŸ“ˆ Output
184
  ------------------------------------------------------------------------------------
185
- Text (Any Language) β†’ Tokenizer πŸ”€ β†’ Language 🌍
186
  β†’ Shared mBERT Encoder 🧠 β†’ Bemba / Nyanja /
187
  β†’ CLS Vector 🎯 β†’ English / Mixed
188
  ------------------------------------------------------------------------------------
189
- User Feedback πŸ’¬ β†’ Tokenizer πŸ”€ β†’ Sentiment ❀️
190
  β†’ Shared Encoder 🧠 β†’ Negative / Neutral /
191
  β†’ CLS Vector 🎯 β†’ Positive
192
  ------------------------------------------------------------------------------------
193
- Ride Context πŸš— β†’ Tokenizer πŸ”€ β†’ Topic πŸ—‚οΈ
194
  β†’ Shared Encoder 🧠 β†’ Driver / Payment /
195
  β†’ CLS Vector 🎯 β†’ Support / App / Availability
196
  ------------------------------------------------------------------------------------
197
-
198
  ```
 
59
  - Kelvinmbewe/mbert_LusakaLang_Topic
60
  ---
61
 
62
+ ## **LusakaLang MultiTask Model**
63
 
64
+ This model is a unified transformer architecture built on top of `bert-base-multilingual-cased`, designed to perform three tasks simultaneously:
65
 
66
+ 1. Language Identification
67
+ 2. Sentiment Analysis
68
+ 3. Topic Classification
69
 
70
  The system integrates three fine‑tuned LusakaLang checkpoints:
71
 
72
+ - mbert_Lusaka_Language_Analysis
73
+ - mbert_LusakaLang_Sentiment_Analysis
74
+ - mbert_LusakaLang_Topic
75
 
76
  All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead
77
  and promotes consistent, harmonized predictions across all tasks.
 
92
 
93
  ---
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## **How to Use This Model**
96
 
97
 
 
100
  import torch
101
 
102
  class LusakaLangMultiTask:
103
+ def __init__(self, path="Kelvinmbewe/LusakaLang-MultiTask"):
104
+ self.tokenizer = AutoTokenizer.from_pretrained(path)
105
+ self.model = torch.load(f"{path}/model.pt").eval()
106
+
107
+ def predict_language(self, texts): pass
108
+ def predict_sentiment(self, texts): pass
109
+ def predict_topic(self, texts): pass
110
+
 
 
 
 
 
 
 
111
  llm = LusakaLangMultiTask()
112
+
113
+ print(llm.predict_language([...]))
114
+ print(llm.predict_sentiment([...]))
115
+ print(llm.predict_topic([...]))
116
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ```
118
 
119
  ## Sample Output
120
 
121
+ ```python
122
  # Language Identification 🌍
123
  [
124
  {"lang": "Bemba", "conf": 0.96},
 
140
  ```
141
 
142
 
 
 
143
  ```
144
+ =========================== Training Architecture ===========================
145
 
146
+ πŸ“₯ Input β†’ 🧠 Core Engine β†’ πŸ“ˆ Output
147
  ------------------------------------------------------------------------------------
148
+ Text (Any Language) β†’ Tokenizer πŸ”€ β†’ Language 🌍
149
  β†’ Shared mBERT Encoder 🧠 β†’ Bemba / Nyanja /
150
  β†’ CLS Vector 🎯 β†’ English / Mixed
151
  ------------------------------------------------------------------------------------
152
+ User Feedback πŸ’¬ β†’ Tokenizer πŸ”€ β†’ Sentiment ❀️
153
  β†’ Shared Encoder 🧠 β†’ Negative / Neutral /
154
  β†’ CLS Vector 🎯 β†’ Positive
155
  ------------------------------------------------------------------------------------
156
+ Ride Context πŸš— β†’ Tokenizer πŸ”€ β†’ Topic πŸ—‚οΈ
157
  β†’ Shared Encoder 🧠 β†’ Driver / Payment /
158
  β†’ CLS Vector 🎯 β†’ Support / App / Availability
159
  ------------------------------------------------------------------------------------
 
160
  ```