--- language: - en - hi - pa tags: - text-classification - bert - finance - multilingual - transaction-classification license: mit --- # BERT Transaction Classifier — SecureWealth Twin (M2) Fine-tuned `bert-base-multilingual-cased` for automatic transaction categorisation across English, Hindi (Devanagari), and Punjabi (Gurmukhi). Part of the **SecureWealth Twin** AI system — a bank-grade fraud detection and financial intelligence platform. --- ## Model Details | | | |---|---| | Base model | `bert-base-multilingual-cased` | | Task | 7-class text classification | | Languages | English · Hindi · Punjabi | | Max sequence length | 64 | | Training epochs | 5 | | Learning rate | 2e-5 | | Batch size | 32 | | Train/val/test split | 70 / 15 / 15 (stratified) | --- ## Categories | Label | ID | |-------|----| | Food | 0 | | Transport | 1 | | EMIs | 2 | | Entertainment | 3 | | Utilities | 4 | | Investments | 5 | | Other | 6 | --- ## Architecture ``` bert-base-multilingual-cased → [CLS] token (768-d) → Dropout(0.3) → Linear(768 → 7) ``` --- ## Usage ### Load and run inference ```python import torch import torch.nn as nn from transformers import BertTokenizer, BertModel from huggingface_hub import hf_hub_download # Model class (must match training definition) class BERTTxnClassifier(nn.Module): def __init__(self): super().__init__() self.bert = BertModel.from_pretrained("bert-base-multilingual-cased") self.drop = nn.Dropout(0.3) self.classifier = nn.Linear(768, 7) def forward(self, input_ids, attention_mask): cls = self.bert(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state[:, 0, :] return self.classifier(self.drop(cls)) CATEGORIES = ["Food", "Transport", "EMIs", "Entertainment", "Utilities", "Investments", "Other"] # Download model + tokenizer model_path = hf_hub_download(repo_id="NanG01/bert-txn-classifier", filename="bert_classifier.pt") device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = BertTokenizer.from_pretrained("NanG01/bert-txn-classifier") model = BERTTxnClassifier() model.load_state_dict(torch.load(model_path, map_location=device)) model.to(device).eval() # Inference def predict(text: str) -> dict: enc = tokenizer(text, max_length=64, padding="max_length", truncation=True, return_tensors="pt") with torch.no_grad(): probs = torch.softmax( model(enc["input_ids"].to(device), enc["attention_mask"].to(device)), dim=-1 ).squeeze(0) pred = probs.argmax().item() return {"category": CATEGORIES[pred], "confidence": round(probs[pred].item(), 4)} ``` ### Examples ```python predict("SWIGGY ORDER PAYMENT") # → {"category": "Food", "confidence": 0.9821} predict("HDFC BANK PERSONAL LOAN EMI") # → {"category": "EMIs", "confidence": 0.9743} predict("OLA RIDE PAYMENT") # → {"category": "Transport", "confidence": 0.9512} predict("ZERODHA MUTUAL FUND") # → {"category": "Investments", "confidence": 0.9301} predict("बिजली बिल भुगतान") # → {"category": "Utilities", "confidence": 0.9104} predict("ਖਾਣੇ ਦਾ ਭੁਗਤਾਨ") # → {"category": "Food", "confidence": 0.8932} ``` --- ## Files | File | Description | |------|-------------| | `bert_classifier.pt` | Full model state dict | | `tokenizer/tokenizer.json` | Tokenizer vocab + merges | | `tokenizer/tokenizer_config.json` | Tokenizer config | --- ## Training Data ~1,300 transaction descriptions across 3 languages: - ~500 English transactions - ~400 Hindi transactions - ~400 Punjabi transactions Dataset: `SecureWealthTwin_DL_Datasets_v2.xlsx` (private) --- ## Part of SecureWealth Twin This model is M2 in a 6-model AI system: | # | Model | Task | |---|-------|------| | M1 | BehaviorDNA | Behavioural anomaly detection | | **M2** | **BERT Txn Classifier** | **Transaction categorisation** | | M3 | NLP Coercion Detector | Coercion language detection | | M4 | Coercion Risk Scorer | Composite risk scoring | | M5 | Monte Carlo Simulator | Wealth projection | | M6 | Predictive Early Warning | Financial distress prediction | **GitHub:** [BlackBox-Wealth/AI_Models_2](https://github.com/BlackBox-Wealth/AI_Models_2)