File size: 3,046 Bytes
c75c724
 
63e9ab2
 
c75c724
 
63e9ab2
c69637a
63e9ab2
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
 
e0ab709
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
c69637a
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
c69637a
63e9ab2
 
c75c724
63e9ab2
c75c724
 
 
63e9ab2
 
 
c75c724
c69637a
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
c75c724
63e9ab2
 
c75c724
63e9ab2
 
 
 
 
 
c75c724
63e9ab2
 
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
c75c724
63e9ab2
 
 
 
 
 
 
 
c75c724
63e9ab2
c75c724
63e9ab2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
library_name: transformers
tags:
- finance
---

- **Developed by:** Team CodeBlooded
- **Funded by:** EpiUse & University of Pretoria
- **Model type:** DistilBertForSequenceClassification
- **Language(s) (NLP):** English

# fin-classifier

## Overview

**Repository:** CodeBlooded-capstone/fin-classifier
A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.

---

## Model Details

* **Model type:** `DistilBertForSequenceClassification`
* **Version:** v1.0 (initial release)
* **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier)
* **Authors:** CodeBlooded

---

## Intended Use

### Primary use case

* **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
* **Users:** Personal finance apps, budgeting tools, fintech platforms

### Out-of-scope use cases

* Legal or compliance decisions
* Any use requiring 100% classification accuracy or safety guarantees

---

## Training Data

* **Source:** Kaggle `personal_transactions.csv` dataset
* **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`).
* **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement.

---

## Evaluation

* **Split:** 90% train / 10% test split (seed=42) from the training corpus
* **Metric:** Macro F1-score
* **Results:**

  * Macro F1 on test set: **0.XX** (not yet measured)

---

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False
)

example = "STARBUCKS STORE 1234"
print(classifier(example))  # {'label': 'Food & Dining', 'score': 0.95}
```

---

## Limitations & Bias

* Performance varies by category: categories with fewer examples may see lower F1.
* The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
* Should not be used as a sole source for financial decision-making.

---

## Maintenance & Continuous Learning

* New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining.
* Checkpoints are saved to `results/` and versioned on Hugging Face.

---

## License

Apache 2.0

---

## Citation

```
@misc{fin-classifier2025,
  author = {CodeBlooded},
  title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
  year = {2025},
  howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}
```

---

*This model card was generated on 2025-07-12.*