| | --- |
| | library_name: transformers |
| | tags: |
| | - finance |
| | --- |
| | |
| | - **Developed by:** Team CodeBlooded |
| | - **Funded by:** EpiUse & University of Pretoria |
| | - **Model type:** DistilBertForSequenceClassification |
| | - **Language(s) (NLP):** English |
| |
|
| | # fin-classifier |
| |
|
| | ## Overview |
| |
|
| | **Repository:** CodeBlooded-capstone/fin-classifier |
| | A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories. |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | * **Model type:** `DistilBertForSequenceClassification` |
| | * **Version:** v1.0 (initial release) |
| | * **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier) |
| | * **Authors:** CodeBlooded |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | ### Primary use case |
| |
|
| | * **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks |
| | * **Users:** Personal finance apps, budgeting tools, fintech platforms |
| |
|
| | ### Out-of-scope use cases |
| |
|
| | * Legal or compliance decisions |
| | * Any use requiring 100% classification accuracy or safety guarantees |
| |
|
| | --- |
| |
|
| | ## Training Data |
| |
|
| | * **Source:** Kaggle `personal_transactions.csv` dataset |
| | * **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`). |
| | * **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement. |
| |
|
| | --- |
| |
|
| | ## Evaluation |
| |
|
| | * **Split:** 90% train / 10% test split (seed=42) from the training corpus |
| | * **Metric:** Macro F1-score |
| | * **Results:** |
| |
|
| | * Macro F1 on test set: **0.XX** (not yet measured) |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier") |
| | model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier") |
| | |
| | classifier = pipeline( |
| | "text-classification", |
| | model=model, |
| | tokenizer=tokenizer, |
| | return_all_scores=False |
| | ) |
| | |
| | example = "STARBUCKS STORE 1234" |
| | print(classifier(example)) # {'label': 'Food & Dining', 'score': 0.95} |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Limitations & Bias |
| |
|
| | * Performance varies by category: categories with fewer examples may see lower F1. |
| | * The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants). |
| | * Should not be used as a sole source for financial decision-making. |
| |
|
| | --- |
| |
|
| | ## Maintenance & Continuous Learning |
| |
|
| | * New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining. |
| | * Checkpoints are saved to `results/` and versioned on Hugging Face. |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ``` |
| | @misc{fin-classifier2025, |
| | author = {CodeBlooded}, |
| | title = {fin-classifier: A DistilBERT-based Transaction Categorization Model}, |
| | year = {2025}, |
| | howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | *This model card was generated on 2025-07-12.* |