CoolHatt
/

distalBERT-BANK-COMPLAINS

+---
+---
+language:
+- en
+tags:
+- text-classification
+- complaint-classification
+- distilbert
+- cfpb
+- banking
+- finance
+license: apache-2.0
+base_model: distilbert-base-uncased
+datasets:
+- davidheineman/consumer-finance-complaints-large
+metrics:
+- accuracy
+- f1
+---
+# distalBERT-BANK-COMPLAINS
+A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for classifying consumer banking and financial complaints into product categories, based on the [CFPB Consumer Complaints dataset](https://huggingface.co/datasets/davidheineman/consumer-finance-complaints-large).
+## Model Description
+This model takes a raw consumer complaint narrative as input and classifies it into one of several financial product categories (e.g., `CREDIT_CARD`, `HOME_LOAN`, `DEBT_COLLECTION`, etc.). It is fine-tuned on a balanced, class-weighted subset of the CFPB complaints dataset to handle real-world class imbalance.
+- **Base model:** `distilbert-base-uncased`
+- **Task:** Multi-class text classification
+- **Language:** English
+- **Max token length:** 512
+## Intended Use
+This model is intended for **research purposes only**. It is not designed or validated for production deployment in financial, legal, or compliance contexts. Potential research applications include:
+- Benchmarking NLP models on financial complaint classification
+- Studying consumer complaint patterns across product categories
+- Exploring transfer learning from general-purpose language models to domain-specific tasks
+**Not intended for:** automated decision-making, regulatory compliance, or any production system affecting consumers.
+## Training Details
+| Parameter | Value |
+|---|---|
+| Epochs | 4 |
+| Batch size | 32 |
+| Learning rate | 2e-5 |
+| Weight decay | 0.01 |
+| Warmup ratio | 0.1 |
+| Samples per class | 5000 |
+| Train / Val / Test split | 75% / 10% / 15% |
+| Optimizer | AdamW |
+| Framework | HuggingFace Transformers 4.44.2 |
+Class imbalance was handled via:
+- Stratified balanced sampling (5000 samples per class)
+- Weighted cross-entropy loss during training
+## Usage
+```python
+from transformers import pipeline
+clf = pipeline(
+    "text-classification",
+    model="CoolHatt/distalBERT-BANK-COMPLAINS",
+)
+result = clf("I was charged twice on my credit card and the bank refused to refund me.")
+print(result)
+# [{'label': 'CREDIT_CARD', 'score': 0.97}]
+```
+## Labels
+The model predicts the following product categories:
+| Label | Description |
+|---|---|
+| `CREDIT_CARD` | Credit card or prepaid card complaints |
+| `HOME_LOAN` | Mortgage and home loan complaints |
+| `DEBT_COLLECTION` | Debt collection complaints |
+| `CREDIT_REPORTING` | Credit reporting and repair complaints |
+| `PERSONAL_LOAN` | Personal / student / vehicle loan complaints |
+| `BANK_ACCOUNT` | Checking / savings account complaints |
+| `MONEY_TRANSFER` | Money transfer and virtual currency complaints |
+> Note: Refer to `label_meta.json` in the repository for the full `label2id` / `id2label` mapping used during training.
+## Limitations
+- Trained on English-language complaints only
+- Performance may degrade on very short complaint texts (under 30 characters)
+- PII in complaints was redacted during training using regex patterns — the model expects similarly anonymized text for best results
+## License
+This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
+## Citation
+If you use this model, please cite the base model:
+```bibtex
+@article{sanh2019distilbert,
+  title={DistilBERT, a distilled version of BERT},
+  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
+  journal={arXiv preprint arXiv:1910.01108},
+  year={2019}
+}
+```
+---