File size: 7,247 Bytes
ca573f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# Banking77 Intent Classifier β€” DistilBERT + LoRA Fine-Tuning

![Python](https://img.shields.io/badge/Python-3.10-blue)
![PyTorch](https://img.shields.io/badge/PyTorch-2.0-orange)
![HuggingFace](https://img.shields.io/badge/HuggingFace-Transformers-yellow)
![LoRA](https://img.shields.io/badge/PEFT-LoRA-green)

## Problem Statement

Customer support systems receive thousands of messages daily. 
Manually routing each message to the correct department is 
slow, expensive, and error-prone. This project builds an 
automated intent classifier that categorizes customer banking 
queries into 77 distinct intents β€” enabling instant, accurate 
routing without human intervention.

**Real-world challenge:** How do you fine-tune a transformer 
model for production use when you have no GPU, no expensive 
API costs, and limited compute resources?

**Solution:** Parameter Efficient Fine-Tuning using LoRA β€” 
training only 1.17% of model parameters while retaining 
full model capability.

---

## Dataset

**Banking77** β€” Industry standard customer support benchmark

| Property | Value |
|---|---|
| Training examples | 10,003 |
| Test examples | 3,080 |
| Intent classes | 77 |
| Average text length | 59.5 characters |
| Class imbalance ratio | 5.34x |

Sample intents: `lost_or_stolen_card`, `declined_card_payment`, 
`change_pin`, `top_up_by_card`, `fiat_currency_support`

---

## Architecture & Technical Decisions

### Why DistilBERT over BERT-base?

| Model | Parameters | Performance | Memory |
|---|---|---|---|
| BERT-base | 110M | 100% | High |
| DistilBERT | 66M | 97% | 40% less |

DistilBERT retains 97% of BERT's performance through knowledge 
distillation while using 40% fewer parameters. Critical for 
training on free Colab T4 GPU without memory crashes.

### Why LoRA?

Full fine-tuning of 66M parameters is expensive and risks 
catastrophic forgetting β€” where the model overwrites pretrained 
knowledge with task-specific patterns.

LoRA freezes all pretrained weights and introduces two small 
adapter matrices alongside each attention layer:
Original matrix W:  768 Γ— 768 = 589,824 parameters
LoRA Matrix A:      768 Γ— 8   =   6,144 parameters
LoRA Matrix B:        8 Γ— 768 =   6,144 parameters
Reduction: 98.8% fewer trainable parameters

**Result:**
Total parameters:     67,012,685
Trainable parameters:    797,261  (1.17%)

### LoRA Configuration

```python
LoraConfig(
    r=8,                              # rank β€” sweet spot for this task
    lora_alpha=16,                    # scaling factor (2x rank)
    target_modules=["q_lin", "v_lin"],# query and value attention matrices
    lora_dropout=0.1,                 # regularization
    task_type=TaskType.SEQ_CLS        # sequence classification
)
```

### Handling Class Imbalance

Data exploration revealed a 5.34x imbalance between most and 
least frequent classes (187 vs 35 examples). Training without 
correction causes the model to ignore rare intents entirely.

**Fix: Inverse frequency weighted loss**

```python
class_weights = 1.0 / label_counts
# Rare class β†’ high weight β†’ misclassifying it costs more
# Common class β†’ low weight β†’ model cannot ignore rare classes
```

---

## Training Configuration

```python
TrainingArguments(
    num_train_epochs=5,
    per_device_train_batch_size=32,
    learning_rate=2e-5,          # small to prevent catastrophic forgetting
    warmup_steps=100,            # gradual lr increase at start
    weight_decay=0.01,           # regularization
    fp16=True,                   # half precision β€” 2x memory saving
    eval_strategy="epoch",
    load_best_model_at_end=True
)
```

**Why learning rate 2e-5?**
Large learning rates aggressively overwrite pretrained weights. 
2e-5 gently nudges existing knowledge toward the task without 
destroying what BERT learned during pretraining.

---

## Results

### Training Curve

| Epoch | Training Loss | Validation Loss | Accuracy |
|---|---|---|---|
| 1 | 3.9726 | 3.5859 | 38.76% |
| 2 | 2.5550 | 2.2843 | 61.14% |
| 3 | 1.9706 | 1.7091 | 68.63% |
| 4 | 1.6654 | 1.4714 | 71.03% |
| 5 | 1.5524 | 1.4026 | 71.73% |

**Final Test Accuracy: 72.69%**

Baseline (random): 1.3% β€” model achieves 56x improvement over random.

### Per-Class Performance

**Top 5 Best Performing Intents:**

| Intent | F1 Score |
|---|---|
| verify_top_up | 1.000 |
| age_limit | 0.976 |
| passcode_forgotten | 0.941 |
| edit_personal_details | 0.940 |
| get_physical_card | 0.925 |

**Top 5 Worst Performing Intents:**

| Intent | F1 Score |
|---|---|
| topping_up_by_card | 0.170 |
| why_verify_identity | 0.333 |
| request_refund | 0.353 |
| supported_cards_and_currencies | 0.353 |
| top_up_by_bank_transfer_charge | 0.370 |

### Failure Mode Analysis

Poor performing intents share a common pattern β€” semantic 
overlap. A customer saying "I want to add money to my card" 
could legitimately belong to:

- `topping_up_by_card`
- `top_up_by_bank_transfer_charge`  
- `top_up_by_cash`
- `transfer_into_account`

The model distributes confidence across all similar intents 
rather than making one strong prediction. This is a dataset 
limitation β€” Banking77's 77 classes contain genuinely 
ambiguous boundaries that even human annotators struggle with.

**This explains why confidence scores are moderate:**
"I want to change my PIN" β†’ change_pin (47.46%) β€” clear intent
"I lost my card"          β†’ card_not_working (14.40%) β€” ambiguous

---

## Inference Demo

```python
test_queries = [
    "I lost my card and need a replacement",
    "Why was my payment declined?",
    "How do I add money to my account?",
    "I want to change my PIN number",
    "What currencies do you support?"
]
```

**Results:**
Text: Why was my payment declined?
Intent: declined_card_payment
Confidence: 19.94%
Text: I want to change my PIN number
Intent: change_pin
Confidence: 47.46%
Text: What currencies do you support?
Intent: fiat_currency_support
Confidence: 35.15%

---

## What I Would Improve With More Resources

1. **Larger rank r** β€” r=16 or r=32 would give model more 
   capacity to learn complex intent boundaries

2. **More data for rare classes** β€” data augmentation or 
   synthetic generation for intents with only 35 examples

3. **Intent merging** β€” semantically overlapping intents 
   like `topping_up_by_card` and `top_up_by_bank_transfer_charge` 
   could be merged into parent categories

4. **Larger base model** β€” RoBERTa-base or DeBERTa would 
   likely push accuracy above 80%

5. **Contrastive learning** β€” train model to explicitly 
   push similar intents apart in embedding space

---

## Stack

| Component | Tool |
|---|---|
| Base Model | distilbert-base-uncased |
| Fine-tuning | HuggingFace PEFT + LoRA |
| Training | PyTorch + HuggingFace Trainer |
| Dataset | HuggingFace Datasets |
| Evaluation | sklearn + evaluate |
| Compute | Google Colab T4 GPU (free) |

---

## Project Structure
banking77-intent-classifier/
β”œβ”€β”€ notebook.ipynb    # Full pipeline: data β†’ train β†’ eval β†’ inference
└── README.md         # This file

---

## Author

**Syed Muhammad Aneeb Ur Rehman**  
AI/ML Engineer | Full-Stack Developer  
[LinkedIn](https://linkedin.com/in/syedaneeb15) | 
[GitHub](https://github.com/aneebnaqvi15)