File size: 5,970 Bytes
c02e3e6
 
455e3a7
 
755bbde
455e3a7
01253aa
455e3a7
 
 
 
 
01253aa
 
 
 
 
57c6694
01253aa
57c6694
c02e3e6
57c6694
c02e3e6
09cd487
 
 
 
 
c02e3e6
57c6694
c02e3e6
455e3a7
c02e3e6
57c6694
01253aa
57c6694
01253aa
57c6694
01253aa
57c6694
 
 
 
01253aa
 
 
57c6694
455e3a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c02e3e6
455e3a7
c02e3e6
57c6694
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c02e3e6
57c6694
 
 
 
c02e3e6
57c6694
c02e3e6
57c6694
 
 
c02e3e6
455e3a7
c02e3e6
57c6694
 
 
01253aa
57c6694
 
 
 
01253aa
57c6694
01253aa
57c6694
01253aa
 
 
57c6694
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c02e3e6
455e3a7
c02e3e6
57c6694
8d8f20f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57c6694
c02e3e6
57c6694
 
 
 
c02e3e6
455e3a7
c02e3e6
57c6694
c02e3e6
57c6694
 
 
 
c02e3e6
01253aa
c02e3e6
455e3a7
755bbde
c02e3e6
455e3a7
 
c02e3e6
455e3a7
 
09cd487
c02e3e6
455e3a7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
library_name: transformers
tags:
- finance
license: apache-2.0
datasets:
- learn-abc/banking-intent-dataset
language:
- en
- bn
base_model:
- google/muril-base-cased
metrics:
- accuracy
pipeline_tag: text-classification
---

# Multilingual Banking Intent Classifier (EN + BN + Banglish)

## Overview

This model is a fine-tuned **MuRIL-based multilingual intent classifier** designed for production-grade banking chatbots.

- **Model Name:** Banking Multilingual Intent Classifier
- **Base Model:** google/muril-base-cased
- **Task:** Multilingual Intent Classification
- **Intents:** 14
- **Languages:** English, Bangla (Bengali script), Banglish (Romanized Bengali), Code-Mixed

The model performs 14-way intent classification for banking conversational systems.

---

## Base Model

`google/muril-base-cased`

MuRIL was selected for:

* Strong multilingual support
* Excellent performance on Indic languages
* Stable tokenization for Bangla + English
* Robust handling of code-mixed inputs

---

## Supported Intents (14)

```
ACCOUNT_INFO
ATM_SUPPORT
CARD_ISSUE
CARD_MANAGEMENT
CARD_REPLACEMENT
CHECK_BALANCE
EDIT_PERSONAL_DETAILS
FAILED_TRANSFER
FALLBACK
FEES
GREETING
LOST_OR_STOLEN_CARD
MINI_STATEMENT
TRANSFER
```

---

## Dataset Summary

### Total Samples: 100,971

### Languages (Balanced)

| Language           | Count  |
| ------------------ | ------ |
| English (en)       | 33,657 |
| Bangla (bn)        | 33,657 |
| Banglish (bn-latn) | 33,657 |

Additional 500 code-mixed examples included.

---

## Final Training Dataset

| Split | Samples |
| ----- | ------- |
| Train | 91,051  |
| Test  | 20,295  |

### Class Distribution (Final Train)

- All intents are within a safe 4–10% range.
- FALLBACK is controlled at ~9.4%, preventing dominance.
- This distribution avoids class collapse and overconfidence bias.

---

## Evaluation Metrics

### Overall Performance

* Accuracy: **99.12%**
* F1 Micro: **99.12%**
* F1 Macro: **99.08%**
* Validation Loss: 0.046

---

## Per-Intent Accuracy

| Intent                | Accuracy |
| --------------------- | -------- |
| ACCOUNT_INFO          | 99.14%   |
| ATM_SUPPORT           | 99.70%   |
| CARD_ISSUE            | 99.25%   |
| CARD_MANAGEMENT       | 99.43%   |
| CARD_REPLACEMENT      | 99.08%   |
| CHECK_BALANCE         | 99.05%   |
| EDIT_PERSONAL_DETAILS | 100.00%  |
| FAILED_TRANSFER       | 98.75%   |
| FALLBACK              | 97.86%   |
| FEES                  | 99.76%   |
| GREETING              | 97.41%   |
| LOST_OR_STOLEN_CARD   | 99.59%   |
| MINI_STATEMENT        | 98.80%   |
| TRANSFER              | 99.78%   |

---

## Strengths

* Strong multilingual support
* Balanced dataset distribution
* Robust fallback handling
* Stable across operational banking intents
* High macro F1 ensures no minority intent collapse
* Performs well on code-mixed queries

---

## Intended Use

* Banking chatbot intent routing
* Customer support automation
* Financial conversational AI
* Multilingual banking assistants

---

## Out of Scope

* Fraud detection
* Sentiment analysis
* Financial advisory decisions
* Regulatory or legal compliance automation

---

## Production Recommendations

* Apply confidence thresholding
* Route low-confidence predictions to human fallback
* Use softmax entropy monitoring
* Normalize numeric expressions before inference
* Log confusion pairs in production

---

## Example Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "learn-abc/banking-multilingual-intent-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Prediction function
def predict_intent(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        prediction = torch.argmax(outputs.logits, dim=-1).item()
        confidence = torch.softmax(outputs.logits, dim=-1)[0][prediction].item()
    
    predicted_intent = model.config.id2label[prediction]
    
    return {
        "intent": predicted_intent,
        "confidence": confidence
    }

# Example usage - English
result = predict_intent("what is my balance")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.99

# Example usage - Bangla
result = predict_intent("আমার ব্যালেন্স কত")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.98

# Example usage - Banglish (Romanized)
result = predict_intent("amar balance koto ache")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: CHECK_BALANCE, Confidence: 0.97

# Example usage - Code-mixed
result = predict_intent("আমার last 10 transaction দেখাও")
print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}")
# Output: Intent: MINI_STATEMENT, Confidence: 0.98
```

---

## Limitations

* Does not handle multi-turn conversational context
* Extremely ambiguous short inputs may require thresholding
* Synthetic data may introduce stylistic bias
* No speech-to-text robustness included

---

## Version

- Version: 2.0
- Status: Production-Ready
- Architecture: MuRIL Base
- Language Coverage: EN + BN + Banglish

---

## License
This project is licensed under the Apache 2.0 License.

## Contact Me
For any inquiries or support, please reach out to:

* **Author:** [Abhishek Singh](https://github.com/SinghIsWriting/)
* **LinkedIn:** [My LinkedIn Profile](https://www.linkedin.com/in/abhishek-singh-bba2662a9)
* **Portfolio:** [Abhishek Singh Portfolio](https://me.devhome.me/)

---