---
license: mit
language:
- en
metrics:
- accuracy
- f1
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
- AI
- Artificial-Intellegence
- AI-Disclosure
- Finance
- BERT
- Financial-NLP
- Sentence-Classification
- Transformers
- Banking
- Machine-Learning
---

# BankAI-BERT
BankAI-BERT is a domain-specific BERT-based model fine-tuned for detecting AI-related disclosures in banking texts. 

## Intended Use
BankAI-BERT is designed to assist researchers, analysts, and regulators in identifying AI narratives in financial disclosures at the sentence level.

## Performance
- Accuracy: 99.37%
- F1-score: 0.993
- ROC AUC: 1.000
- Brier Score: 0.0000

## Training Data
BankAI-BERT was fine-tuned on a manually annotated dataset comprising sentences from U.S. bank annual reports spanning 2015 to 2023. The final training set included a balanced sample of 1,586 sentences—793 labeled as AI-related and 793 as non-AI. The model was initialized using the bert-base-uncased architecture.

## Training
| Setting                  | Value |
|--------------------------|-------|
| Base model               | `bert-base-uncased` |
| Epochs                   | 3 |
| Batch size               | 8 (train & eval) |
| Max seq length           | 128 |
| Optimizer / LR scheduler | Hugging Face `Trainer` defaults (`AdamW`, lr 5e-5) |
| Hardware                 | Google Colab GPU (T4) |

## Evaluation & Robustness
* Benchmarked against Logistic Regression, Naive Bayes, Random Forest, and XGBoost (TF-IDF features); BankAI-BERT scored highest on F1.  
* Calibration checked via Brier Score (0 = perfect).
* SHAP analysis shows the model focuses on meaningful cues (e.g., machine learning, AI-powered)—not noise—ensuring interpretability and trust. 
* Robust to:
  * Year-by-year slices (2015 → 2023 all F1 ≥ 0.99).  
  * Adversarial / edge-case sentences (100 % correct in manual test).  
  * Sentence-length bias (Pearson r ≈ 0.19, week correlation → no substential bias).

## Files Included
- `config.json`, `tokenizer.json`, `vocab.txt`, `model.safetensors`: Model files
- `tokenizer_config.json`, `special_tokens_map.json`: Tokenizer configuration

## GitHub Repository

For full pipeline, data, and visualizations, see the [**GitHub repository**](https://github.com/bilalezafar/BankAI-BERT).
.

## Citation
Please cite my paper if you use this model:
- **Zafar, M. B. (2025). AI in Banking Disclosures: A BERT Classifier and Corpus-Level Thematic Mapping**


## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bilalzafar/BankAI-BERT")
model = AutoModelForSequenceClassification.from_pretrained("bilalzafar/BankAI-BERT")

## Inference Example
from transformers import pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = classifier("We are integrating AI into our credit risk management systems.")
print(result)
### Note: 1=AI and 0=Non-AI