Instructions to use maaz-zaidi/transaction-classifier-fasttext with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- fastText
How to use maaz-zaidi/transaction-classifier-fasttext with fastText:
from huggingface_hub import hf_hub_download import fasttext model = fasttext.load_model(hf_hub_download("maaz-zaidi/transaction-classifier-fasttext", "model.bin")) - Notebooks
- Google Colab
- Kaggle
Transaction Classifier — FastText (v2)
A FastText supervised model that classifies bank transaction strings into 10 budget categories using subword embeddings.
This is version 2 (Phase 2) in a progressive model development series. It introduced direction detection (credit vs debit) and a rules engine, but the FastText ML component itself suffered from severe Income category bias.
Model Details
| Property | Value |
|---|---|
| Architecture | FastText supervised (subword n-grams) |
| Task | Multi-class text classification (10 categories) |
| Training samples | 3,597,859 |
| Epochs | 10 |
| Learning rate | 0.5 |
| Word n-grams | 2 |
| Embedding dim | 100 |
| Subword range | 3-6 characters |
| Loss | Softmax |
| Format | .bin (FastText binary) |
| Trained | 2026-03-28 |
Categories
| ID | Category |
|---|---|
| 0 | Food & Dining |
| 1 | Transportation |
| 2 | Shopping & Retail |
| 3 | Entertainment & Recreation |
| 4 | Healthcare & Medical |
| 5 | Utilities & Services |
| 6 | Financial Services |
| 7 | Income |
| 8 | Government & Legal |
| 9 | Charity & Donations |
Performance
Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026).
| Metric | Score |
|---|---|
| Real-world accuracy (weighted) | 55.7% |
| FastText-only accuracy | 14.8% |
| Direction detection accuracy | 100.0% |
| Rules accuracy | 91.3% |
| Validation accuracy | 99.0% |
Key finding: FastText achieves only 14.8% on unknown merchants due to subword n-gram overlap between the Income category in the training data and real merchant names. The model defaults to predicting Income for most inputs.
Usage
import fasttext
model = fasttext.load_model("fasttext_model.bin")
result = model.predict("MCDONALD'S #12345 TORONTO ON")
label = result[0][0].replace("__label__", "")
confidence = result[1][0]
print(f"Category: {label}, Confidence: {confidence:.3f}")
Dependencies
fasttext
Training Data
- Primary: mitulshah/transaction-categorization - full 3.6M records (gated dataset)
- Evaluation: 505 real-world RBC bank transactions (2019-2026)
Key Contributions
Despite the weak ML component, Phase 2 introduced two critical pipeline stages:
- Direction Detection: Rule-based credit/debit detection achieving 100% accuracy
- Rules Engine: YAML-based pattern matching for structural transaction patterns (91.3% accuracy on matched transactions)
These pipeline stages carried forward into all subsequent versions.
Part of a Series
See the Transaction Classifier collection for all 7 model versions.
Limitations
- Severe Income category bias (14.8% ML-only accuracy)
- Subword n-gram features from Income training examples overlap with real merchant names
- Superseded by SetFit (v3) which achieved 66.7% ML-only accuracy using pre-trained embeddings
Citation
@misc{zaidi2026txnclassifier,
title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
author={Maaz Zaidi},
year={2026},
url={https://huggingface.co/maaz-zaidi/transaction-classifier-fasttext}
}
- Downloads last month
- 18
Dataset used to train maaz-zaidi/transaction-classifier-fasttext
Collection including maaz-zaidi/transaction-classifier-fasttext
Evaluation results
- Real-World Accuracy (Weighted)self-reported0.557
- FastText-Only Accuracyself-reported0.148
- Validation Accuracyself-reported0.990