---
library_name: transformers
tags:
- finance
---

- **Developed by:** Team CodeBlooded
- **Funded by:** EpiUse & University of Pretoria
- **Model type:** DistilBertForSequenceClassification
- **Language(s) (NLP):** English

# fin-classifier

## Overview

**Repository:** CodeBlooded-capstone/fin-classifier
A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories.

---

## Model Details

* **Model type:** `DistilBertForSequenceClassification`
* **Version:** v1.0 (initial release)
* **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier)
* **Authors:** CodeBlooded

---

## Intended Use

### Primary use case

* **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks
* **Users:** Personal finance apps, budgeting tools, fintech platforms

### Out-of-scope use cases

* Legal or compliance decisions
* Any use requiring 100% classification accuracy or safety guarantees

---

## Training Data

* **Source:** Kaggle `personal_transactions.csv` dataset
* **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`).
* **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement.

---

## Evaluation

* **Split:** 90% train / 10% test split (seed=42) from the training corpus
* **Metric:** Macro F1-score
* **Results:**

  * Macro F1 on test set: **0.XX** (not yet measured)

---

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier")

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False
)

example = "STARBUCKS STORE 1234"
print(classifier(example))  # {'label': 'Food & Dining', 'score': 0.95}
```

---

## Limitations & Bias

* Performance varies by category: categories with fewer examples may see lower F1.
* The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants).
* Should not be used as a sole source for financial decision-making.

---

## Maintenance & Continuous Learning

* New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining.
* Checkpoints are saved to `results/` and versioned on Hugging Face.

---

## License

Apache 2.0

---

## Citation

```
@misc{fin-classifier2025,
  author = {CodeBlooded},
  title = {fin-classifier: A DistilBERT-based Transaction Categorization Model},
  year = {2025},
  howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}}
}
```

---

*This model card was generated on 2025-07-12.*