--- license: mit language: - en metrics: - accuracy - f1 base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification library_name: transformers tags: - AI - Artificial-Intellegence - AI-Disclosure - Finance - BERT - Financial-NLP - Sentence-Classification - Transformers - Banking - Machine-Learning --- # BankAI-BERT BankAI-BERT is a domain-specific BERT-based model fine-tuned for detecting AI-related disclosures in banking texts. ## Intended Use BankAI-BERT is designed to assist researchers, analysts, and regulators in identifying AI narratives in financial disclosures at the sentence level. ## Performance - Accuracy: 99.37% - F1-score: 0.993 - ROC AUC: 1.000 - Brier Score: 0.0000 ## Training Data BankAI-BERT was fine-tuned on a manually annotated dataset comprising sentences from U.S. bank annual reports spanning 2015 to 2023. The final training set included a balanced sample of 1,586 sentences—793 labeled as AI-related and 793 as non-AI. The model was initialized using the bert-base-uncased architecture. ## Training | Setting | Value | |--------------------------|-------| | Base model | `bert-base-uncased` | | Epochs | 3 | | Batch size | 8 (train & eval) | | Max seq length | 128 | | Optimizer / LR scheduler | Hugging Face `Trainer` defaults (`AdamW`, lr 5e-5) | | Hardware | Google Colab GPU (T4) | ## Evaluation & Robustness * Benchmarked against Logistic Regression, Naive Bayes, Random Forest, and XGBoost (TF-IDF features); BankAI-BERT scored highest on F1. * Calibration checked via Brier Score (0 = perfect). * SHAP analysis shows the model focuses on meaningful cues (e.g., machine learning, AI-powered)—not noise—ensuring interpretability and trust. * Robust to: * Year-by-year slices (2015 → 2023 all F1 ≥ 0.99). * Adversarial / edge-case sentences (100 % correct in manual test). * Sentence-length bias (Pearson r ≈ 0.19, week correlation → no substential bias). ## Files Included - `config.json`, `tokenizer.json`, `vocab.txt`, `model.safetensors`: Model files - `tokenizer_config.json`, `special_tokens_map.json`: Tokenizer configuration ## GitHub Repository For full pipeline, data, and visualizations, see the [**GitHub repository**](https://github.com/bilalezafar/BankAI-BERT). . ## Citation Please cite my paper if you use this model: - **Zafar, M. B. (2025). AI in Banking Disclosures: A BERT Classifier and Corpus-Level Thematic Mapping** ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("bilalzafar/BankAI-BERT") model = AutoModelForSequenceClassification.from_pretrained("bilalzafar/BankAI-BERT") ## Inference Example from transformers import pipeline classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) result = classifier("We are integrating AI into our credit risk management systems.") print(result) ### Note: 1=AI and 0=Non-AI