--- library_name: transformers tags: - finance --- - **Developed by:** Team CodeBlooded - **Funded by:** EpiUse & University of Pretoria - **Model type:** DistilBertForSequenceClassification - **Language(s) (NLP):** English # fin-classifier ## Overview **Repository:** CodeBlooded-capstone/fin-classifier A DistilBERT-based text classification model for categorizing financial transaction descriptions into one of N predefined categories. --- ## Model Details * **Model type:** `DistilBertForSequenceClassification` * **Version:** v1.0 (initial release) * **Hugging Face repo:** [https://huggingface.co/CodeBlooded-capstone/fin-classifier](https://huggingface.co/CodeBlooded-capstone/fin-classifier) * **Authors:** CodeBlooded --- ## Intended Use ### Primary use case * **Task:** Automated categorization of banking and credit card transaction descriptions for South Afrucan banks * **Users:** Personal finance apps, budgeting tools, fintech platforms ### Out-of-scope use cases * Legal or compliance decisions * Any use requiring 100% classification accuracy or safety guarantees --- ## Training Data * **Source:** Kaggle `personal_transactions.csv` dataset * **Mapping:** Original vendor-level categories mapped into an internal schema of \~M high-level categories (`data/categories.json`). * **Feedback augmentation:** User-corrected labels from `feedback_corrected.json` are appended to the training set for continuous improvement. --- ## Evaluation * **Split:** 90% train / 10% test split (seed=42) from the training corpus * **Metric:** Macro F1-score * **Results:** * Macro F1 on test set: **0.XX** (not yet measured) --- ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("CodeBlooded-capstone/fin-classifier") model = AutoModelForSequenceClassification.from_pretrained("CodeBlooded-capstone/fin-classifier") classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, return_all_scores=False ) example = "STARBUCKS STORE 1234" print(classifier(example)) # {'label': 'Food & Dining', 'score': 0.95} ``` --- ## Limitations & Bias * Performance varies by category: categories with fewer examples may see lower F1. * The model reflects biases present in the original Kaggle dataset (e.g., over/under-representation of certain merchants). * Should not be used as a sole source for financial decision-making. --- ## Maintenance & Continuous Learning * New user feedback corrections are stored in `model/feedback_corrected.json` and incorporated during retraining. * Checkpoints are saved to `results/` and versioned on Hugging Face. --- ## License Apache 2.0 --- ## Citation ``` @misc{fin-classifier2025, author = {CodeBlooded}, title = {fin-classifier: A DistilBERT-based Transaction Categorization Model}, year = {2025}, howpublished = {\url{https://huggingface.co/CodeBlooded-capstone/fin-classifier}} } ``` --- *This model card was generated on 2025-07-12.*