Transaction Classifier — FastText (v2)

A FastText supervised model that classifies bank transaction strings into 10 budget categories using subword embeddings.

This is version 2 (Phase 2) in a progressive model development series. It introduced direction detection (credit vs debit) and a rules engine, but the FastText ML component itself suffered from severe Income category bias.

Model Details

Property	Value
Architecture	FastText supervised (subword n-grams)
Task	Multi-class text classification (10 categories)
Training samples	3,597,859
Epochs	10
Learning rate	0.5
Word n-grams	2
Embedding dim	100
Subword range	3-6 characters
Loss	Softmax
Format	`.bin` (FastText binary)
Trained	2026-03-28

ID	Category
0	Food & Dining
1	Transportation
2	Shopping & Retail
3	Entertainment & Recreation
4	Healthcare & Medical
5	Utilities & Services
6	Financial Services
7	Income
8	Government & Legal
9	Charity & Donations

Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026).

Metric	Score
Real-world accuracy (weighted)	55.7%
FastText-only accuracy	14.8%
Direction detection accuracy	100.0%
Rules accuracy	91.3%
Validation accuracy	99.0%

Key finding: FastText achieves only 14.8% on unknown merchants due to subword n-gram overlap between the Income category in the training data and real merchant names. The model defaults to predicting Income for most inputs.

Usage

import fasttext

model = fasttext.load_model("fasttext_model.bin")

result = model.predict("MCDONALD'S #12345 TORONTO ON")
label = result[0][0].replace("__label__", "")
confidence = result[1][0]

print(f"Category: {label}, Confidence: {confidence:.3f}")

Dependencies

fasttext

Training Data

Primary: mitulshah/transaction-categorization - full 3.6M records (gated dataset)
Evaluation: 505 real-world RBC bank transactions (2019-2026)

Key Contributions

Despite the weak ML component, Phase 2 introduced two critical pipeline stages:

Direction Detection: Rule-based credit/debit detection achieving 100% accuracy
Rules Engine: YAML-based pattern matching for structural transaction patterns (91.3% accuracy on matched transactions)

These pipeline stages carried forward into all subsequent versions.

Part of a Series

See the Transaction Classifier collection for all 7 model versions.

Limitations

Severe Income category bias (14.8% ML-only accuracy)
Subword n-gram features from Income training examples overlap with real merchant names
Superseded by SetFit (v3) which achieved 66.7% ML-only accuracy using pre-trained embeddings

Citation

@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-fasttext}
}

Downloads last month: 1

Dataset used to train maaz-zaidi/transaction-classifier-fasttext

Collection including maaz-zaidi/transaction-classifier-fasttext

Transaction Classifier

Collection

A versioned progressive model series for classifying raw bank transaction strings into 10 budget categories. • 7 items • Updated May 9

Evaluation results

Real-World Accuracy (Weighted)
self-reported

0.557
FastText-Only Accuracy
self-reported

0.148
Validation Accuracy
self-reported

0.990

maaz-zaidi
/

transaction-classifier-fasttext