Transaction Classifier — FastText (v2)

A FastText supervised model that classifies bank transaction strings into 10 budget categories using subword embeddings.

This is version 2 (Phase 2) in a progressive model development series. It introduced direction detection (credit vs debit) and a rules engine, but the FastText ML component itself suffered from severe Income category bias.

Model Details

Property Value
Architecture FastText supervised (subword n-grams)
Task Multi-class text classification (10 categories)
Training samples 3,597,859
Epochs 10
Learning rate 0.5
Word n-grams 2
Embedding dim 100
Subword range 3-6 characters
Loss Softmax
Format .bin (FastText binary)
Trained 2026-03-28

Categories

ID Category
0 Food & Dining
1 Transportation
2 Shopping & Retail
3 Entertainment & Recreation
4 Healthcare & Medical
5 Utilities & Services
6 Financial Services
7 Income
8 Government & Legal
9 Charity & Donations

Performance

Evaluated on 505 unique real-world RBC transactions (3,113 weighted, 2019-2026).

Metric Score
Real-world accuracy (weighted) 55.7%
FastText-only accuracy 14.8%
Direction detection accuracy 100.0%
Rules accuracy 91.3%
Validation accuracy 99.0%

Key finding: FastText achieves only 14.8% on unknown merchants due to subword n-gram overlap between the Income category in the training data and real merchant names. The model defaults to predicting Income for most inputs.

Usage

import fasttext

model = fasttext.load_model("fasttext_model.bin")

result = model.predict("MCDONALD'S #12345 TORONTO ON")
label = result[0][0].replace("__label__", "")
confidence = result[1][0]

print(f"Category: {label}, Confidence: {confidence:.3f}")

Dependencies

fasttext

Training Data

Key Contributions

Despite the weak ML component, Phase 2 introduced two critical pipeline stages:

  1. Direction Detection: Rule-based credit/debit detection achieving 100% accuracy
  2. Rules Engine: YAML-based pattern matching for structural transaction patterns (91.3% accuracy on matched transactions)

These pipeline stages carried forward into all subsequent versions.

Part of a Series

See the Transaction Classifier collection for all 7 model versions.

Limitations

  • Severe Income category bias (14.8% ML-only accuracy)
  • Subword n-gram features from Income training examples overlap with real merchant names
  • Superseded by SetFit (v3) which achieved 66.7% ML-only accuracy using pre-trained embeddings

Citation

@misc{zaidi2026txnclassifier,
  title={Transaction Classifier: Multi-Stage Bank Transaction Categorization},
  author={Maaz Zaidi},
  year={2026},
  url={https://huggingface.co/maaz-zaidi/transaction-classifier-fasttext}
}
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maaz-zaidi/transaction-classifier-fasttext

Collection including maaz-zaidi/transaction-classifier-fasttext

Evaluation results