File size: 4,303 Bytes

553240d

---
language:
- en
- hi
- pa
tags:
- text-classification
- bert
- finance
- multilingual
- transaction-classification
license: mit
---

# BERT Transaction Classifier — SecureWealth Twin (M2)

Fine-tuned `bert-base-multilingual-cased` for automatic transaction categorisation across English, Hindi (Devanagari), and Punjabi (Gurmukhi).

Part of the **SecureWealth Twin** AI system — a bank-grade fraud detection and financial intelligence platform.

---

## Model Details

| | |
|---|---|
| Base model | `bert-base-multilingual-cased` |
| Task | 7-class text classification |
| Languages | English · Hindi · Punjabi |
| Max sequence length | 64 |
| Training epochs | 5 |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Train/val/test split | 70 / 15 / 15 (stratified) |

---

## Categories

| Label | ID |
|-------|----|
| Food | 0 |
| Transport | 1 |
| EMIs | 2 |
| Entertainment | 3 |
| Utilities | 4 |
| Investments | 5 |
| Other | 6 |

---

## Architecture

```
bert-base-multilingual-cased
  → [CLS] token (768-d)
  → Dropout(0.3)
  → Linear(768 → 7)
```

---

## Usage

### Load and run inference

```python
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertModel
from huggingface_hub import hf_hub_download

# Model class (must match training definition)
class BERTTxnClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.bert       = BertModel.from_pretrained("bert-base-multilingual-cased")
        self.drop       = nn.Dropout(0.3)
        self.classifier = nn.Linear(768, 7)

    def forward(self, input_ids, attention_mask):
        cls = self.bert(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state[:, 0, :]
        return self.classifier(self.drop(cls))

CATEGORIES = ["Food", "Transport", "EMIs", "Entertainment", "Utilities", "Investments", "Other"]

# Download model + tokenizer
model_path = hf_hub_download(repo_id="NanG01/bert-txn-classifier", filename="bert_classifier.pt")

device    = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = BertTokenizer.from_pretrained("NanG01/bert-txn-classifier")
model     = BERTTxnClassifier()
model.load_state_dict(torch.load(model_path, map_location=device))
model.to(device).eval()

# Inference
def predict(text: str) -> dict:
    enc = tokenizer(text, max_length=64, padding="max_length",
                    truncation=True, return_tensors="pt")
    with torch.no_grad():
        probs = torch.softmax(
            model(enc["input_ids"].to(device), enc["attention_mask"].to(device)), dim=-1
        ).squeeze(0)
    pred = probs.argmax().item()
    return {"category": CATEGORIES[pred], "confidence": round(probs[pred].item(), 4)}
```

### Examples

```python
predict("SWIGGY ORDER PAYMENT")
# → {"category": "Food", "confidence": 0.9821}

predict("HDFC BANK PERSONAL LOAN EMI")
# → {"category": "EMIs", "confidence": 0.9743}

predict("OLA RIDE PAYMENT")
# → {"category": "Transport", "confidence": 0.9512}

predict("ZERODHA MUTUAL FUND")
# → {"category": "Investments", "confidence": 0.9301}

predict("बिजली बिल भुगतान")
# → {"category": "Utilities", "confidence": 0.9104}

predict("ਖਾਣੇ ਦਾ ਭੁਗਤਾਨ")
# → {"category": "Food", "confidence": 0.8932}
```

---

## Files

| File | Description |
|------|-------------|
| `bert_classifier.pt` | Full model state dict |
| `tokenizer/tokenizer.json` | Tokenizer vocab + merges |
| `tokenizer/tokenizer_config.json` | Tokenizer config |

---

## Training Data

~1,300 transaction descriptions across 3 languages:
- ~500 English transactions
- ~400 Hindi transactions
- ~400 Punjabi transactions

Dataset: `SecureWealthTwin_DL_Datasets_v2.xlsx` (private)

---

## Part of SecureWealth Twin

This model is M2 in a 6-model AI system:

| # | Model | Task |
|---|-------|------|
| M1 | BehaviorDNA | Behavioural anomaly detection |
| **M2** | **BERT Txn Classifier** | **Transaction categorisation** |
| M3 | NLP Coercion Detector | Coercion language detection |
| M4 | Coercion Risk Scorer | Composite risk scoring |
| M5 | Monte Carlo Simulator | Wealth projection |
| M6 | Predictive Early Warning | Financial distress prediction |

**GitHub:** [BlackBox-Wealth/AI_Models_2](https://github.com/BlackBox-Wealth/AI_Models_2)