CoolHatt commited on
Commit
b3e773d
·
verified ·
1 Parent(s): 51e80fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ ---
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-classification
7
+ - complaint-classification
8
+ - distilbert
9
+ - cfpb
10
+ - banking
11
+ - finance
12
+ license: apache-2.0
13
+ base_model: distilbert-base-uncased
14
+ datasets:
15
+ - davidheineman/consumer-finance-complaints-large
16
+ metrics:
17
+ - accuracy
18
+ - f1
19
+ ---
20
+
21
+ # distalBERT-BANK-COMPLAINS
22
+
23
+ A fine-tuned [DistilBERT](https://huggingface.co/distilbert-base-uncased) model for classifying consumer banking and financial complaints into product categories, based on the [CFPB Consumer Complaints dataset](https://huggingface.co/datasets/davidheineman/consumer-finance-complaints-large).
24
+
25
+ ## Model Description
26
+
27
+ This model takes a raw consumer complaint narrative as input and classifies it into one of several financial product categories (e.g., `CREDIT_CARD`, `HOME_LOAN`, `DEBT_COLLECTION`, etc.). It is fine-tuned on a balanced, class-weighted subset of the CFPB complaints dataset to handle real-world class imbalance.
28
+
29
+ - **Base model:** `distilbert-base-uncased`
30
+ - **Task:** Multi-class text classification
31
+ - **Language:** English
32
+ - **Max token length:** 512
33
+
34
+ ## Intended Use
35
+
36
+ This model is intended for **research purposes only**. It is not designed or validated for production deployment in financial, legal, or compliance contexts. Potential research applications include:
37
+
38
+ - Benchmarking NLP models on financial complaint classification
39
+ - Studying consumer complaint patterns across product categories
40
+ - Exploring transfer learning from general-purpose language models to domain-specific tasks
41
+
42
+ **Not intended for:** automated decision-making, regulatory compliance, or any production system affecting consumers.
43
+
44
+ ## Training Details
45
+
46
+ | Parameter | Value |
47
+ |---|---|
48
+ | Epochs | 4 |
49
+ | Batch size | 32 |
50
+ | Learning rate | 2e-5 |
51
+ | Weight decay | 0.01 |
52
+ | Warmup ratio | 0.1 |
53
+ | Samples per class | 5000 |
54
+ | Train / Val / Test split | 75% / 10% / 15% |
55
+ | Optimizer | AdamW |
56
+ | Framework | HuggingFace Transformers 4.44.2 |
57
+
58
+ Class imbalance was handled via:
59
+ - Stratified balanced sampling (5000 samples per class)
60
+ - Weighted cross-entropy loss during training
61
+
62
+ ## Usage
63
+
64
+ ```python
65
+ from transformers import pipeline
66
+
67
+ clf = pipeline(
68
+ "text-classification",
69
+ model="CoolHatt/distalBERT-BANK-COMPLAINS",
70
+ )
71
+
72
+ result = clf("I was charged twice on my credit card and the bank refused to refund me.")
73
+ print(result)
74
+ # [{'label': 'CREDIT_CARD', 'score': 0.97}]
75
+ ```
76
+
77
+ ## Labels
78
+
79
+ The model predicts the following product categories:
80
+
81
+ | Label | Description |
82
+ |---|---|
83
+ | `CREDIT_CARD` | Credit card or prepaid card complaints |
84
+ | `HOME_LOAN` | Mortgage and home loan complaints |
85
+ | `DEBT_COLLECTION` | Debt collection complaints |
86
+ | `CREDIT_REPORTING` | Credit reporting and repair complaints |
87
+ | `PERSONAL_LOAN` | Personal / student / vehicle loan complaints |
88
+ | `BANK_ACCOUNT` | Checking / savings account complaints |
89
+ | `MONEY_TRANSFER` | Money transfer and virtual currency complaints |
90
+
91
+ > Note: Refer to `label_meta.json` in the repository for the full `label2id` / `id2label` mapping used during training.
92
+
93
+ ## Limitations
94
+
95
+ - Trained on English-language complaints only
96
+ - Performance may degrade on very short complaint texts (under 30 characters)
97
+ - PII in complaints was redacted during training using regex patterns — the model expects similarly anonymized text for best results
98
+
99
+ ## License
100
+
101
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
102
+
103
+ ## Citation
104
+
105
+ If you use this model, please cite the base model:
106
+
107
+ ```bibtex
108
+ @article{sanh2019distilbert,
109
+ title={DistilBERT, a distilled version of BERT},
110
+ author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
111
+ journal={arXiv preprint arXiv:1910.01108},
112
+ year={2019}
113
+ }
114
+ ```
115
+ ---