PiggyBank Transaction Category Classifier
This model classifies financial transactions into spending categories for use in the PiggyBank budgeting app. It was trained on the 7.4M transaction dataset from Kaggle:
π https://www.kaggle.com/datasets/ismetsemedov/transactions/
The dataset includes transactions from 12 countries:
| Country |
|---|
| Nigeria |
| Germany |
| Brazil |
| USA |
| Canada |
| Singapore |
| France |
| Australia |
| UK |
| Mexico |
| Japan |
| Russia |
π§ Model Architecture
This is a scikit-learn pipeline consisting of:
TfidfVectorizer β transforms transaction text (merchant, type, city, country)
LogisticRegression β multi-class category classifier
The model predicts categories such as:
Travel
Restaurant
Entertainment
Shopping
Groceries
And moreβ¦
π― Intended Use
Given a transaction description string (e.g., merchant name, location, or contextual data), the model outputs a predicted spending category and its confidence score.
Trained Text Features
Only the following text fields from the Kaggle dataset were used:
merchant
merchant_type
merchant_category
city
country
Note: The Kaggle dataset is not included in this repository. Only the trained model is hosted here.
π Example Inference
Input:
TIM HORTONS #1234 CALGARY AB
Output:
{
"category": "Restaurant",
"confidence": 0.78
}
π¦ Repository Files
model.joblib β scikit-learn pipeline (TF-IDF + LogisticRegression)
config.json β model metadata
requirements.txt β Python dependencies
β οΈ Limitations
The model is trained on synthetic or anonymized Kaggle data and may not perfectly reflect real banking transactions.
Accuracy may vary across countries and merchant formats.
Should not be used for regulatory, auditing, or high-stakes financial decision-making without additional evaluation.
π License & Data Usage
Model License: Choose one (e.g., MIT, Apache 2.0) Training Data: Kaggle dataset ismetsemedov/transactions Please refer to the datasetβs Kaggle page for terms of use.
This model is intended for educational and experimental purposes.