Update README.md

f64d867 unverified about 1 month ago

7.25 kB

Banking77 Intent Classifier — DistilBERT + LoRA Fine-Tuning

Problem Statement

Customer support systems receive thousands of messages daily. Manually routing each message to the correct department is slow, expensive, and error-prone. This project builds an automated intent classifier that categorizes customer banking queries into 77 distinct intents — enabling instant, accurate routing without human intervention.

Real-world challenge: How do you fine-tune a transformer model for production use when you have no GPU, no expensive API costs, and limited compute resources?

Solution: Parameter Efficient Fine-Tuning using LoRA — training only 1.17% of model parameters while retaining full model capability.

Dataset

Banking77 — Industry standard customer support benchmark

Property	Value
Training examples	10,003
Test examples	3,080
Intent classes	77
Average text length	59.5 characters
Class imbalance ratio	5.34x

Sample intents: lost_or_stolen_card, declined_card_payment, change_pin, top_up_by_card, fiat_currency_support

Architecture & Technical Decisions

Why DistilBERT over BERT-base?

Model	Parameters	Performance	Memory
BERT-base	110M	100%	High
DistilBERT	66M	97%	40% less

DistilBERT retains 97% of BERT's performance through knowledge distillation while using 40% fewer parameters. Critical for training on free Colab T4 GPU without memory crashes.

Why LoRA?

Full fine-tuning of 66M parameters is expensive and risks catastrophic forgetting — where the model overwrites pretrained knowledge with task-specific patterns.

LoRA freezes all pretrained weights and introduces two small adapter matrices alongside each attention layer: Original matrix W: 768 × 768 = 589,824 parameters LoRA Matrix A: 768 × 8 = 6,144 parameters LoRA Matrix B: 8 × 768 = 6,144 parameters Reduction: 98.8% fewer trainable parameters

Result: Total parameters: 67,012,685 Trainable parameters: 797,261 (1.17%)

LoRA Configuration

LoraConfig(
    r=8,                              # rank — sweet spot for this task
    lora_alpha=16,                    # scaling factor (2x rank)
    target_modules=["q_lin", "v_lin"],# query and value attention matrices
    lora_dropout=0.1,                 # regularization
    task_type=TaskType.SEQ_CLS        # sequence classification
)

Handling Class Imbalance

Data exploration revealed a 5.34x imbalance between most and least frequent classes (187 vs 35 examples). Training without correction causes the model to ignore rare intents entirely.

Fix: Inverse frequency weighted loss

class_weights = 1.0 / label_counts
# Rare class → high weight → misclassifying it costs more
# Common class → low weight → model cannot ignore rare classes

Training Configuration

TrainingArguments(
    num_train_epochs=5,
    per_device_train_batch_size=32,
    learning_rate=2e-5,          # small to prevent catastrophic forgetting
    warmup_steps=100,            # gradual lr increase at start
    weight_decay=0.01,           # regularization
    fp16=True,                   # half precision — 2x memory saving
    eval_strategy="epoch",
    load_best_model_at_end=True
)

Why learning rate 2e-5? Large learning rates aggressively overwrite pretrained weights. 2e-5 gently nudges existing knowledge toward the task without destroying what BERT learned during pretraining.

Results

Training Curve

Epoch	Training Loss	Validation Loss	Accuracy
1	3.9726	3.5859	38.76%
2	2.5550	2.2843	61.14%
3	1.9706	1.7091	68.63%
4	1.6654	1.4714	71.03%
5	1.5524	1.4026	71.73%

Final Test Accuracy: 72.69%

Baseline (random): 1.3% — model achieves 56x improvement over random.

Per-Class Performance

Top 5 Best Performing Intents:

Intent	F1 Score
verify_top_up	1.000
age_limit	0.976
passcode_forgotten	0.941
edit_personal_details	0.940
get_physical_card	0.925

Top 5 Worst Performing Intents:

Intent	F1 Score
topping_up_by_card	0.170
why_verify_identity	0.333
request_refund	0.353
supported_cards_and_currencies	0.353
top_up_by_bank_transfer_charge	0.370

Failure Mode Analysis

Poor performing intents share a common pattern — semantic overlap. A customer saying "I want to add money to my card" could legitimately belong to:

topping_up_by_card
top_up_by_bank_transfer_charge
top_up_by_cash
transfer_into_account

The model distributes confidence across all similar intents rather than making one strong prediction. This is a dataset limitation — Banking77's 77 classes contain genuinely ambiguous boundaries that even human annotators struggle with.

This explains why confidence scores are moderate: "I want to change my PIN" → change_pin (47.46%) — clear intent "I lost my card" → card_not_working (14.40%) — ambiguous

Inference Demo

test_queries = [
    "I lost my card and need a replacement",
    "Why was my payment declined?",
    "How do I add money to my account?",
    "I want to change my PIN number",
    "What currencies do you support?"
]

Results: Text: Why was my payment declined? Intent: declined_card_payment Confidence: 19.94% Text: I want to change my PIN number Intent: change_pin Confidence: 47.46% Text: What currencies do you support? Intent: fiat_currency_support Confidence: 35.15%

What I Would Improve With More Resources

Larger rank r — r=16 or r=32 would give model more capacity to learn complex intent boundaries
More data for rare classes — data augmentation or synthetic generation for intents with only 35 examples
Intent merging — semantically overlapping intents like topping_up_by_card and top_up_by_bank_transfer_charge could be merged into parent categories
Larger base model — RoBERTa-base or DeBERTa would likely push accuracy above 80%
Contrastive learning — train model to explicitly push similar intents apart in embedding space

Stack

Component	Tool
Base Model	distilbert-base-uncased
Fine-tuning	HuggingFace PEFT + LoRA
Training	PyTorch + HuggingFace Trainer
Dataset	HuggingFace Datasets
Evaluation	sklearn + evaluate
Compute	Google Colab T4 GPU (free)

Project Structure

banking77-intent-classifier/ ├── notebook.ipynb # Full pipeline: data → train → eval → inference └── README.md # This file

Author

Syed Muhammad Aneeb Ur Rehman
AI/ML Engineer | Full-Stack Developer
LinkedIn | GitHub