bert-txn-classifier / README.md
NanG01's picture
Upload README.md
553240d verified
---
language:
- en
- hi
- pa
tags:
- text-classification
- bert
- finance
- multilingual
- transaction-classification
license: mit
---
# BERT Transaction Classifier — SecureWealth Twin (M2)
Fine-tuned `bert-base-multilingual-cased` for automatic transaction categorisation across English, Hindi (Devanagari), and Punjabi (Gurmukhi).
Part of the **SecureWealth Twin** AI system — a bank-grade fraud detection and financial intelligence platform.
---
## Model Details
| | |
|---|---|
| Base model | `bert-base-multilingual-cased` |
| Task | 7-class text classification |
| Languages | English · Hindi · Punjabi |
| Max sequence length | 64 |
| Training epochs | 5 |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Train/val/test split | 70 / 15 / 15 (stratified) |
---
## Categories
| Label | ID |
|-------|----|
| Food | 0 |
| Transport | 1 |
| EMIs | 2 |
| Entertainment | 3 |
| Utilities | 4 |
| Investments | 5 |
| Other | 6 |
---
## Architecture
```
bert-base-multilingual-cased
→ [CLS] token (768-d)
→ Dropout(0.3)
→ Linear(768 → 7)
```
---
## Usage
### Load and run inference
```python
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertModel
from huggingface_hub import hf_hub_download
# Model class (must match training definition)
class BERTTxnClassifier(nn.Module):
def __init__(self):
super().__init__()
self.bert = BertModel.from_pretrained("bert-base-multilingual-cased")
self.drop = nn.Dropout(0.3)
self.classifier = nn.Linear(768, 7)
def forward(self, input_ids, attention_mask):
cls = self.bert(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state[:, 0, :]
return self.classifier(self.drop(cls))
CATEGORIES = ["Food", "Transport", "EMIs", "Entertainment", "Utilities", "Investments", "Other"]
# Download model + tokenizer
model_path = hf_hub_download(repo_id="NanG01/bert-txn-classifier", filename="bert_classifier.pt")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = BertTokenizer.from_pretrained("NanG01/bert-txn-classifier")
model = BERTTxnClassifier()
model.load_state_dict(torch.load(model_path, map_location=device))
model.to(device).eval()
# Inference
def predict(text: str) -> dict:
enc = tokenizer(text, max_length=64, padding="max_length",
truncation=True, return_tensors="pt")
with torch.no_grad():
probs = torch.softmax(
model(enc["input_ids"].to(device), enc["attention_mask"].to(device)), dim=-1
).squeeze(0)
pred = probs.argmax().item()
return {"category": CATEGORIES[pred], "confidence": round(probs[pred].item(), 4)}
```
### Examples
```python
predict("SWIGGY ORDER PAYMENT")
# → {"category": "Food", "confidence": 0.9821}
predict("HDFC BANK PERSONAL LOAN EMI")
# → {"category": "EMIs", "confidence": 0.9743}
predict("OLA RIDE PAYMENT")
# → {"category": "Transport", "confidence": 0.9512}
predict("ZERODHA MUTUAL FUND")
# → {"category": "Investments", "confidence": 0.9301}
predict("बिजली बिल भुगतान")
# → {"category": "Utilities", "confidence": 0.9104}
predict("ਖਾਣੇ ਦਾ ਭੁਗਤਾਨ")
# → {"category": "Food", "confidence": 0.8932}
```
---
## Files
| File | Description |
|------|-------------|
| `bert_classifier.pt` | Full model state dict |
| `tokenizer/tokenizer.json` | Tokenizer vocab + merges |
| `tokenizer/tokenizer_config.json` | Tokenizer config |
---
## Training Data
~1,300 transaction descriptions across 3 languages:
- ~500 English transactions
- ~400 Hindi transactions
- ~400 Punjabi transactions
Dataset: `SecureWealthTwin_DL_Datasets_v2.xlsx` (private)
---
## Part of SecureWealth Twin
This model is M2 in a 6-model AI system:
| # | Model | Task |
|---|-------|------|
| M1 | BehaviorDNA | Behavioural anomaly detection |
| **M2** | **BERT Txn Classifier** | **Transaction categorisation** |
| M3 | NLP Coercion Detector | Coercion language detection |
| M4 | Coercion Risk Scorer | Composite risk scoring |
| M5 | Monte Carlo Simulator | Wealth projection |
| M6 | Predictive Early Warning | Financial distress prediction |
**GitHub:** [BlackBox-Wealth/AI_Models_2](https://github.com/BlackBox-Wealth/AI_Models_2)