| --- |
| language: |
| - en |
| - hi |
| - pa |
| tags: |
| - text-classification |
| - bert |
| - finance |
| - multilingual |
| - transaction-classification |
| license: mit |
| --- |
| |
| # BERT Transaction Classifier — SecureWealth Twin (M2) |
|
|
| Fine-tuned `bert-base-multilingual-cased` for automatic transaction categorisation across English, Hindi (Devanagari), and Punjabi (Gurmukhi). |
|
|
| Part of the **SecureWealth Twin** AI system — a bank-grade fraud detection and financial intelligence platform. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | | | |
| |---|---| |
| | Base model | `bert-base-multilingual-cased` | |
| | Task | 7-class text classification | |
| | Languages | English · Hindi · Punjabi | |
| | Max sequence length | 64 | |
| | Training epochs | 5 | |
| | Learning rate | 2e-5 | |
| | Batch size | 32 | |
| | Train/val/test split | 70 / 15 / 15 (stratified) | |
|
|
| --- |
|
|
| ## Categories |
|
|
| | Label | ID | |
| |-------|----| |
| | Food | 0 | |
| | Transport | 1 | |
| | EMIs | 2 | |
| | Entertainment | 3 | |
| | Utilities | 4 | |
| | Investments | 5 | |
| | Other | 6 | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| bert-base-multilingual-cased |
| → [CLS] token (768-d) |
| → Dropout(0.3) |
| → Linear(768 → 7) |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Load and run inference |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| from transformers import BertTokenizer, BertModel |
| from huggingface_hub import hf_hub_download |
| |
| # Model class (must match training definition) |
| class BERTTxnClassifier(nn.Module): |
| def __init__(self): |
| super().__init__() |
| self.bert = BertModel.from_pretrained("bert-base-multilingual-cased") |
| self.drop = nn.Dropout(0.3) |
| self.classifier = nn.Linear(768, 7) |
| |
| def forward(self, input_ids, attention_mask): |
| cls = self.bert(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state[:, 0, :] |
| return self.classifier(self.drop(cls)) |
| |
| CATEGORIES = ["Food", "Transport", "EMIs", "Entertainment", "Utilities", "Investments", "Other"] |
| |
| # Download model + tokenizer |
| model_path = hf_hub_download(repo_id="NanG01/bert-txn-classifier", filename="bert_classifier.pt") |
| |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| tokenizer = BertTokenizer.from_pretrained("NanG01/bert-txn-classifier") |
| model = BERTTxnClassifier() |
| model.load_state_dict(torch.load(model_path, map_location=device)) |
| model.to(device).eval() |
| |
| # Inference |
| def predict(text: str) -> dict: |
| enc = tokenizer(text, max_length=64, padding="max_length", |
| truncation=True, return_tensors="pt") |
| with torch.no_grad(): |
| probs = torch.softmax( |
| model(enc["input_ids"].to(device), enc["attention_mask"].to(device)), dim=-1 |
| ).squeeze(0) |
| pred = probs.argmax().item() |
| return {"category": CATEGORIES[pred], "confidence": round(probs[pred].item(), 4)} |
| ``` |
|
|
| ### Examples |
|
|
| ```python |
| predict("SWIGGY ORDER PAYMENT") |
| # → {"category": "Food", "confidence": 0.9821} |
| |
| predict("HDFC BANK PERSONAL LOAN EMI") |
| # → {"category": "EMIs", "confidence": 0.9743} |
| |
| predict("OLA RIDE PAYMENT") |
| # → {"category": "Transport", "confidence": 0.9512} |
| |
| predict("ZERODHA MUTUAL FUND") |
| # → {"category": "Investments", "confidence": 0.9301} |
| |
| predict("बिजली बिल भुगतान") |
| # → {"category": "Utilities", "confidence": 0.9104} |
| |
| predict("ਖਾਣੇ ਦਾ ਭੁਗਤਾਨ") |
| # → {"category": "Food", "confidence": 0.8932} |
| ``` |
|
|
| --- |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `bert_classifier.pt` | Full model state dict | |
| | `tokenizer/tokenizer.json` | Tokenizer vocab + merges | |
| | `tokenizer/tokenizer_config.json` | Tokenizer config | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| ~1,300 transaction descriptions across 3 languages: |
| - ~500 English transactions |
| - ~400 Hindi transactions |
| - ~400 Punjabi transactions |
|
|
| Dataset: `SecureWealthTwin_DL_Datasets_v2.xlsx` (private) |
|
|
| --- |
|
|
| ## Part of SecureWealth Twin |
|
|
| This model is M2 in a 6-model AI system: |
|
|
| | # | Model | Task | |
| |---|-------|------| |
| | M1 | BehaviorDNA | Behavioural anomaly detection | |
| | **M2** | **BERT Txn Classifier** | **Transaction categorisation** | |
| | M3 | NLP Coercion Detector | Coercion language detection | |
| | M4 | Coercion Risk Scorer | Composite risk scoring | |
| | M5 | Monte Carlo Simulator | Wealth projection | |
| | M6 | Predictive Early Warning | Financial distress prediction | |
|
|
| **GitHub:** [BlackBox-Wealth/AI_Models_2](https://github.com/BlackBox-Wealth/AI_Models_2) |
|
|