YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Banking77 Intent Classifier β DistilBERT + LoRA Fine-Tuning
Problem Statement
Customer support systems receive thousands of messages daily. Manually routing each message to the correct department is slow, expensive, and error-prone. This project builds an automated intent classifier that categorizes customer banking queries into 77 distinct intents β enabling instant, accurate routing without human intervention.
Real-world challenge: How do you fine-tune a transformer model for production use when you have no GPU, no expensive API costs, and limited compute resources?
Solution: Parameter Efficient Fine-Tuning using LoRA β training only 1.17% of model parameters while retaining full model capability.
Dataset
Banking77 β Industry standard customer support benchmark
| Property | Value |
|---|---|
| Training examples | 10,003 |
| Test examples | 3,080 |
| Intent classes | 77 |
| Average text length | 59.5 characters |
| Class imbalance ratio | 5.34x |
Sample intents: lost_or_stolen_card, declined_card_payment,
change_pin, top_up_by_card, fiat_currency_support
Architecture & Technical Decisions
Why DistilBERT over BERT-base?
| Model | Parameters | Performance | Memory |
|---|---|---|---|
| BERT-base | 110M | 100% | High |
| DistilBERT | 66M | 97% | 40% less |
DistilBERT retains 97% of BERT's performance through knowledge distillation while using 40% fewer parameters. Critical for training on free Colab T4 GPU without memory crashes.
Why LoRA?
Full fine-tuning of 66M parameters is expensive and risks catastrophic forgetting β where the model overwrites pretrained knowledge with task-specific patterns.
LoRA freezes all pretrained weights and introduces two small adapter matrices alongside each attention layer: Original matrix W: 768 Γ 768 = 589,824 parameters LoRA Matrix A: 768 Γ 8 = 6,144 parameters LoRA Matrix B: 8 Γ 768 = 6,144 parameters Reduction: 98.8% fewer trainable parameters
Result: Total parameters: 67,012,685 Trainable parameters: 797,261 (1.17%)
LoRA Configuration
LoraConfig(
r=8, # rank β sweet spot for this task
lora_alpha=16, # scaling factor (2x rank)
target_modules=["q_lin", "v_lin"],# query and value attention matrices
lora_dropout=0.1, # regularization
task_type=TaskType.SEQ_CLS # sequence classification
)
Handling Class Imbalance
Data exploration revealed a 5.34x imbalance between most and least frequent classes (187 vs 35 examples). Training without correction causes the model to ignore rare intents entirely.
Fix: Inverse frequency weighted loss
class_weights = 1.0 / label_counts
# Rare class β high weight β misclassifying it costs more
# Common class β low weight β model cannot ignore rare classes
Training Configuration
TrainingArguments(
num_train_epochs=5,
per_device_train_batch_size=32,
learning_rate=2e-5, # small to prevent catastrophic forgetting
warmup_steps=100, # gradual lr increase at start
weight_decay=0.01, # regularization
fp16=True, # half precision β 2x memory saving
eval_strategy="epoch",
load_best_model_at_end=True
)
Why learning rate 2e-5? Large learning rates aggressively overwrite pretrained weights. 2e-5 gently nudges existing knowledge toward the task without destroying what BERT learned during pretraining.
Results
Training Curve
| Epoch | Training Loss | Validation Loss | Accuracy |
|---|---|---|---|
| 1 | 3.9726 | 3.5859 | 38.76% |
| 2 | 2.5550 | 2.2843 | 61.14% |
| 3 | 1.9706 | 1.7091 | 68.63% |
| 4 | 1.6654 | 1.4714 | 71.03% |
| 5 | 1.5524 | 1.4026 | 71.73% |
Final Test Accuracy: 72.69%
Baseline (random): 1.3% β model achieves 56x improvement over random.
Per-Class Performance
Top 5 Best Performing Intents:
| Intent | F1 Score |
|---|---|
| verify_top_up | 1.000 |
| age_limit | 0.976 |
| passcode_forgotten | 0.941 |
| edit_personal_details | 0.940 |
| get_physical_card | 0.925 |
Top 5 Worst Performing Intents:
| Intent | F1 Score |
|---|---|
| topping_up_by_card | 0.170 |
| why_verify_identity | 0.333 |
| request_refund | 0.353 |
| supported_cards_and_currencies | 0.353 |
| top_up_by_bank_transfer_charge | 0.370 |
Failure Mode Analysis
Poor performing intents share a common pattern β semantic overlap. A customer saying "I want to add money to my card" could legitimately belong to:
topping_up_by_cardtop_up_by_bank_transfer_chargetop_up_by_cashtransfer_into_account
The model distributes confidence across all similar intents rather than making one strong prediction. This is a dataset limitation β Banking77's 77 classes contain genuinely ambiguous boundaries that even human annotators struggle with.
This explains why confidence scores are moderate: "I want to change my PIN" β change_pin (47.46%) β clear intent "I lost my card" β card_not_working (14.40%) β ambiguous
Inference Demo
test_queries = [
"I lost my card and need a replacement",
"Why was my payment declined?",
"How do I add money to my account?",
"I want to change my PIN number",
"What currencies do you support?"
]
Results: Text: Why was my payment declined? Intent: declined_card_payment Confidence: 19.94% Text: I want to change my PIN number Intent: change_pin Confidence: 47.46% Text: What currencies do you support? Intent: fiat_currency_support Confidence: 35.15%
What I Would Improve With More Resources
Larger rank r β r=16 or r=32 would give model more capacity to learn complex intent boundaries
More data for rare classes β data augmentation or synthetic generation for intents with only 35 examples
Intent merging β semantically overlapping intents like
topping_up_by_cardandtop_up_by_bank_transfer_chargecould be merged into parent categoriesLarger base model β RoBERTa-base or DeBERTa would likely push accuracy above 80%
Contrastive learning β train model to explicitly push similar intents apart in embedding space
Stack
| Component | Tool |
|---|---|
| Base Model | distilbert-base-uncased |
| Fine-tuning | HuggingFace PEFT + LoRA |
| Training | PyTorch + HuggingFace Trainer |
| Dataset | HuggingFace Datasets |
| Evaluation | sklearn + evaluate |
| Compute | Google Colab T4 GPU (free) |
Project Structure
banking77-intent-classifier/ βββ notebook.ipynb # Full pipeline: data β train β eval β inference βββ README.md # This file
Author
Syed Muhammad Aneeb Ur Rehman
AI/ML Engineer | Full-Stack Developer
LinkedIn |
GitHub