fin-classifier / README.md
CodeBlooded-capstone's picture
Update README.md
e0ab709 verified
---
library_name: transformers
tags:
- finance
---
- **Developed by:** Team CodeBlooded
- **Funded by:** EpiUse & University of Pretoria
- **Model type:** DistilBertForSequenceClassification
- **Language(s) (NLP):** English
# fin-classifier
## Overview
**Repository:** CodeBlooded-capstone/fin-classifier
A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.
---
## Model Details
* **Model type:** `DistilBertForSequenceClassification`
* **Version:** v1.0 (initial release)
* **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier)
* **Authors:** CodeBlooded
---
## Intended Use
### Primary use case
* **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
* **Users:** Personal finance apps, budgeting tools, fintech platforms
### Out-of-scope use cases
* Legal or compliance decisions
* Any use requiring 100% classification accuracy or safety guarantees
---
## Training Data
* **Source:** Kaggle `personal_transactions.csv` dataset
* **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`).
* **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement.
---
## Evaluation
* **Split:** 90% train / 10% test split (seed=42) from the training corpus
* **Metric:** Macro F1-score
* **Results:**
* Macro F1 on test set: **0.XX** (not yet measured)
---
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
return_all_scores=False
)
example = "STARBUCKS STORE 1234"
print(classifier(example)) # {'label': 'Food & Dining', 'score': 0.95}
```
---
## Limitations & Bias
* Performance varies by category: categories with fewer examples may see lower F1.
* The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
* Should not be used as a sole source for financial decision-making.
---
## Maintenance & Continuous Learning
* New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining.
* Checkpoints are saved to `results/` and versioned on Hugging Face.
---
## License
Apache 2.0
---
## Citation
```
@misc{fin-classifier2025,
author = {CodeBlooded},
title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
year = {2025},
howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}
```
---
*This model card was generated on 2025-07-12.*