distalBERT-BANK-COMPLAINS

A fine-tuned DistilBERT model for classifying consumer banking and financial complaints into product categories, based on the CFPB Consumer Complaints dataset.

Model Description

This model takes a raw consumer complaint narrative as input and classifies it into one of several financial product categories (e.g., CREDIT_CARD, HOME_LOAN, DEBT_COLLECTION, etc.). It is fine-tuned on a balanced, class-weighted subset of the CFPB complaints dataset to handle real-world class imbalance.

  • Base model: distilbert-base-uncased
  • Task: Multi-class text classification
  • Language: English
  • Max token length: 512

Intended Use

This model is intended for research purposes only. It is not designed or validated for production deployment in financial, legal, or compliance contexts. Potential research applications include:

  • Benchmarking NLP models on financial complaint classification
  • Studying consumer complaint patterns across product categories
  • Exploring transfer learning from general-purpose language models to domain-specific tasks

Not intended for: automated decision-making, regulatory compliance, or any production system affecting consumers.

Training Details

Parameter Value
Epochs 4
Batch size 32
Learning rate 2e-5
Weight decay 0.01
Warmup ratio 0.1
Samples per class 5000
Train / Val / Test split 75% / 10% / 15%
Optimizer AdamW
Framework HuggingFace Transformers 4.44.2

Class imbalance was handled via:

  • Stratified balanced sampling (5000 samples per class)
  • Weighted cross-entropy loss during training

Usage

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="CoolHatt/distalBERT-BANK-COMPLAINS",
)

result = clf("I was charged twice on my credit card and the bank refused to refund me.")
print(result)
# [{'label': 'CREDIT_CARD', 'score': 0.97}]

Labels

The model predicts the following product categories:

Label Description
CREDIT_CARD Credit card or prepaid card complaints
HOME_LOAN Mortgage and home loan complaints
DEBT_COLLECTION Debt collection complaints
CREDIT_REPORTING Credit reporting and repair complaints
PERSONAL_LOAN Personal / student / vehicle loan complaints
BANK_ACCOUNT Checking / savings account complaints
MONEY_TRANSFER Money transfer and virtual currency complaints

Note: Refer to label_meta.json in the repository for the full label2id / id2label mapping used during training.

Limitations

  • Trained on English-language complaints only
  • Performance may degrade on very short complaint texts (under 30 characters)
  • PII in complaints was redacted during training using regex patterns — the model expects similarly anonymized text for best results

License

This model is licensed under the Apache 2.0 License.

Citation

If you use this model, please cite the base model:

@article{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  journal={arXiv preprint arXiv:1910.01108},
  year={2019}
}

Downloads last month
67
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CoolHatt/distalBERT-BANK-COMPLAINS

Finetuned
(11201)
this model

Dataset used to train CoolHatt/distalBERT-BANK-COMPLAINS

Space using CoolHatt/distalBERT-BANK-COMPLAINS 1

Paper for CoolHatt/distalBERT-BANK-COMPLAINS