--- library_name: transformers tags: - finance license: apache-2.0 datasets: - learn-abc/banking-intent-dataset language: - en - bn base_model: - google/muril-base-cased metrics: - accuracy pipeline_tag: text-classification --- # Multilingual Banking Intent Classifier (EN + BN + Banglish) ## Overview This model is a fine-tuned **MuRIL-based multilingual intent classifier** designed for production-grade banking chatbots. - **Model Name:** Banking Multilingual Intent Classifier - **Base Model:** google/muril-base-cased - **Task:** Multilingual Intent Classification - **Intents:** 14 - **Languages:** English, Bangla (Bengali script), Banglish (Romanized Bengali), Code-Mixed The model performs 14-way intent classification for banking conversational systems. --- ## Base Model `google/muril-base-cased` MuRIL was selected for: * Strong multilingual support * Excellent performance on Indic languages * Stable tokenization for Bangla + English * Robust handling of code-mixed inputs --- ## Supported Intents (14) ``` ACCOUNT_INFO ATM_SUPPORT CARD_ISSUE CARD_MANAGEMENT CARD_REPLACEMENT CHECK_BALANCE EDIT_PERSONAL_DETAILS FAILED_TRANSFER FALLBACK FEES GREETING LOST_OR_STOLEN_CARD MINI_STATEMENT TRANSFER ``` --- ## Dataset Summary ### Total Samples: 100,971 ### Languages (Balanced) | Language | Count | | ------------------ | ------ | | English (en) | 33,657 | | Bangla (bn) | 33,657 | | Banglish (bn-latn) | 33,657 | Additional 500 code-mixed examples included. --- ## Final Training Dataset | Split | Samples | | ----- | ------- | | Train | 91,051 | | Test | 20,295 | ### Class Distribution (Final Train) - All intents are within a safe 4–10% range. - FALLBACK is controlled at ~9.4%, preventing dominance. - This distribution avoids class collapse and overconfidence bias. --- ## Evaluation Metrics ### Overall Performance * Accuracy: **99.12%** * F1 Micro: **99.12%** * F1 Macro: **99.08%** * Validation Loss: 0.046 --- ## Per-Intent Accuracy | Intent | Accuracy | | --------------------- | -------- | | ACCOUNT_INFO | 99.14% | | ATM_SUPPORT | 99.70% | | CARD_ISSUE | 99.25% | | CARD_MANAGEMENT | 99.43% | | CARD_REPLACEMENT | 99.08% | | CHECK_BALANCE | 99.05% | | EDIT_PERSONAL_DETAILS | 100.00% | | FAILED_TRANSFER | 98.75% | | FALLBACK | 97.86% | | FEES | 99.76% | | GREETING | 97.41% | | LOST_OR_STOLEN_CARD | 99.59% | | MINI_STATEMENT | 98.80% | | TRANSFER | 99.78% | --- ## Strengths * Strong multilingual support * Balanced dataset distribution * Robust fallback handling * Stable across operational banking intents * High macro F1 ensures no minority intent collapse * Performs well on code-mixed queries --- ## Intended Use * Banking chatbot intent routing * Customer support automation * Financial conversational AI * Multilingual banking assistants --- ## Out of Scope * Fraud detection * Sentiment analysis * Financial advisory decisions * Regulatory or legal compliance automation --- ## Production Recommendations * Apply confidence thresholding * Route low-confidence predictions to human fallback * Use softmax entropy monitoring * Normalize numeric expressions before inference * Log confusion pairs in production --- ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "learn-abc/banking-multilingual-intent-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) model.eval() # Prediction function def predict_intent(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=64) inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1).item() confidence = torch.softmax(outputs.logits, dim=-1)[0][prediction].item() predicted_intent = model.config.id2label[prediction] return { "intent": predicted_intent, "confidence": confidence } # Example usage - English result = predict_intent("what is my balance") print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}") # Output: Intent: CHECK_BALANCE, Confidence: 0.99 # Example usage - Bangla result = predict_intent("আমার ব্যালেন্স কত") print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}") # Output: Intent: CHECK_BALANCE, Confidence: 0.98 # Example usage - Banglish (Romanized) result = predict_intent("amar balance koto ache") print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}") # Output: Intent: CHECK_BALANCE, Confidence: 0.97 # Example usage - Code-mixed result = predict_intent("আমার last 10 transaction দেখাও") print(f"Intent: {result['intent']}, Confidence: {result['confidence']:.2f}") # Output: Intent: MINI_STATEMENT, Confidence: 0.98 ``` --- ## Limitations * Does not handle multi-turn conversational context * Extremely ambiguous short inputs may require thresholding * Synthetic data may introduce stylistic bias * No speech-to-text robustness included --- ## Version - Version: 2.0 - Status: Production-Ready - Architecture: MuRIL Base - Language Coverage: EN + BN + Banglish --- ## License This project is licensed under the Apache 2.0 License. ## Contact Me For any inquiries or support, please reach out to: * **Author:** [Abhishek Singh](https://github.com/SinghIsWriting/) * **LinkedIn:** [My LinkedIn Profile](https://www.linkedin.com/in/abhishek-singh-bba2662a9) * **Portfolio:** [Abhishek Singh Portfolio](https://me.devhome.me/) ---