File size: 3,046 Bytes
c75c724 63e9ab2 c75c724 63e9ab2 c69637a 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 e0ab709 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 c69637a 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 c69637a 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 c69637a c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 c75c724 63e9ab2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | ---
library_name: transformers
tags:
- finance
---
- **Developed by:** Team CodeBlooded
- **Funded by:** EpiUse & University of Pretoria
- **Model type:** DistilBertForSequenceClassification
- **Language(s) (NLP):** English
# fin-classifier
## Overview
**Repository:** CodeBlooded-capstone/fin-classifier
A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.
---
## Model Details
* **Model type:** `DistilBertForSequenceClassification`
* **Version:** v1.0 (initial release)
* **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier)
* **Authors:** CodeBlooded
---
## Intended Use
### Primary use case
* **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
* **Users:** Personal finance apps, budgeting tools, fintech platforms
### Out-of-scope use cases
* Legal or compliance decisions
* Any use requiring 100% classification accuracy or safety guarantees
---
## Training Data
* **Source:** Kaggle `personal_transactions.csv` dataset
* **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`).
* **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement.
---
## Evaluation
* **Split:** 90% train / 10% test split (seed=42) from the training corpus
* **Metric:** Macro F1-score
* **Results:**
* Macro F1 on test set: **0.XX** (not yet measured)
---
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
return_all_scores=False
)
example = "STARBUCKS STORE 1234"
print(classifier(example)) # {'label': 'Food & Dining', 'score': 0.95}
```
---
## Limitations & Bias
* Performance varies by category: categories with fewer examples may see lower F1.
* The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
* Should not be used as a sole source for financial decision-making.
---
## Maintenance & Continuous Learning
* New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining.
* Checkpoints are saved to `results/` and versioned on Hugging Face.
---
## License
Apache 2.0
---
## Citation
```
@misc{fin-classifier2025,
author = {CodeBlooded},
title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
year = {2025},
howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}
```
---
*This model card was generated on 2025-07-12.* |