DoDataThings commited on
Commit
454644d
·
verified ·
1 Parent(s): a6ba180

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-classification
5
+ - transformers
6
+ - onnx
7
+ - safetensors
8
+ - transformers.js
9
+ - distilbert
10
+ - finance
11
+ - transactions
12
+ - english
13
+ language:
14
+ - en
15
+ datasets:
16
+ - DoDataThings/us-bank-transaction-categories-v2
17
+ pipeline_tag: text-classification
18
+ ---
19
+
20
+ # DistilBERT US Bank Transaction Classifier v2
21
+
22
+ **Successor to [v1](https://huggingface.co/DoDataThings/distilbert-us-transaction-classifier).** Sign-aware classification with expanded merchant coverage, PayPal wrapper handling, and a refined 16-category taxonomy.
23
+
24
+ ## What Changed from v1
25
+
26
+ | | v1 | v2 |
27
+ |---|---|---|
28
+ | **Input format** | Description only | `[debit]` / `[credit]` prefix + full description |
29
+ | **Categories** | 16 (includes Housing) | 16 (Housing split into Rent + Mortgage removed) |
30
+ | **Training data** | 16,000 samples | 24,000 samples |
31
+ | **Merchant coverage** | ~300 merchants | ~500+ merchants |
32
+ | **PayPal awareness** | Limited | Full — PreApproved, Express Checkout, PP*, PAYPAL * |
33
+ | **POS prefix awareness** | SQ* only | SQ*, TST* (Toast), CLV* (Clover) |
34
+ | **Transfer patterns** | Basic | Brokerage sweeps, fintech platforms, wire, cashier's checks, ATM |
35
+
36
+ ### Why v2?
37
+
38
+ v1 confused Income and Transfer because it only saw the description text. A "VENMO CASHOUT" deposit looks the same regardless of direction. v2 prepends `[credit]` or `[debit]` based on the transaction sign (after normalization to cardholder perspective), giving the model a strong directional signal.
39
+
40
+ Mortgage was removed as a model category because mortgage account transactions are better classified by account type — every transaction on a mortgage account is a mortgage payment by definition.
41
+
42
+ ## What This Is (and Isn't)
43
+
44
+ A fine-tuned DistilBERT model for classifying US bank transaction descriptions into 16 spending categories. Designed as a **fallback layer** in a multi-tier classification pipeline — not a standalone classifier.
45
+
46
+ 1. **User rules** — pattern matching catches known merchants (highest accuracy)
47
+ 2. **This model** — classifies everything else, with sign awareness
48
+ 3. **Bank-provided categories** — fallback when model confidence is low
49
+ 4. **User overrides** — manual corrections for edge cases
50
+
51
+ ## Training
52
+
53
+ ```
54
+ Model: DistilBERT-base-uncased + LoRA (r=32, alpha=64)
55
+ Dataset: 24,000 synthetic samples, 1,500 per category
56
+ Trainable: 1.8M / 68.7M parameters (2.6%)
57
+ Training: 20 epochs, ~8 minutes on consumer GPU
58
+ Best epoch: 17 (99.1% validation accuracy)
59
+ ```
60
+
61
+ ### Loss Curve
62
+
63
+ | Epoch | Train Loss | Val Loss | Train Acc | Val Acc |
64
+ |-------|-----------|----------|-----------|---------|
65
+ | 1 | 2.590 | 1.936 | 20.8% | 52.6% |
66
+ | 5 | 0.325 | 0.214 | 90.5% | 93.9% |
67
+ | 10 | 0.078 | 0.055 | 97.8% | 98.3% |
68
+ | 15 | 0.034 | 0.026 | 99.0% | 99.2% |
69
+ | 17 | 0.029 | 0.030 | 99.1% | 99.1% |
70
+
71
+ ### Honest Assessment
72
+
73
+ Validation accuracy (99.1%) is on synthetic data. Real-world performance on ~2,000 transactions:
74
+
75
+ - **86% of model classifications at 0.90+ confidence**
76
+ - **< 0.4% below 0.50 confidence** (9 of 2,038 transactions)
77
+ - Income and Transfer both at 100% on synthetic validation
78
+ - Shopping remains the weakest category (~93%) due to overlap with Subscription and Groceries
79
+
80
+ The sign prefix resolved the Income/Transfer confusion from v1. The main remaining challenge is niche merchants the model hasn't seen — diminishing returns territory best handled by user rules.
81
+
82
+ ## Categories (16)
83
+
84
+ | Category | What it covers |
85
+ |----------|----------------|
86
+ | Restaurants | Fast food, sit-down, coffee shops, food delivery, POS systems (TST*, SQ*, CLV*) |
87
+ | Groceries | Supermarkets, warehouse clubs, farmers markets, convenience stores |
88
+ | Shopping | Retail, online purchases, department stores, pet stores, liquor stores, e-commerce marketplaces |
89
+ | Transportation | Gas, EV charging, rideshare, auto maintenance, parking, tolls, DMV |
90
+ | Entertainment | Movies, events, gaming (Steam, PlayStation), gambling/sportsbooks |
91
+ | Utilities | Electric, internet, phone, water, waste/trash, solar |
92
+ | Subscription | Streaming, SaaS, AI tools, VPNs, social media premium, dating apps, news |
93
+ | Healthcare | Pharmacy, doctor, dentist, telehealth, vision, hospital |
94
+ | Insurance | Auto, home, health, life insurance |
95
+ | Rent | Property management companies, lease payments |
96
+ | Travel | Hotels, airlines, car rental, cruise lines, airport services |
97
+ | Education | Online courses, tutoring, books, tuition, certification |
98
+ | Personal Care | Salon, gym, beauty, spa, barber |
99
+ | Transfer | CC autopay, Zelle/Venmo sends, bank transfers, brokerage sweeps, BNPL, wire transfers, ATM, cashier's checks |
100
+ | Income | Payroll, direct deposit, interest, refunds, government benefits, gig economy payouts |
101
+ | Fees | Bank fees, late fees, service charges, ATM fees |
102
+
103
+ ### Account-Type-Implied Categories (not model-classified)
104
+
105
+ These categories are determined by the account type, not the model:
106
+
107
+ | Account Type | Category |
108
+ |---|---|
109
+ | Mortgage | Mortgage |
110
+ | Auto Loan | Transportation |
111
+ | Student Loan | Education |
112
+ | Personal Loan | Transfer |
113
+ | HELOC | Transfer |
114
+ | CD | Income |
115
+
116
+ ## Usage
117
+
118
+ ### Python
119
+
120
+ ```python
121
+ from transformers import pipeline
122
+
123
+ classifier = pipeline("text-classification", model="DoDataThings/distilbert-us-transaction-classifier-v2")
124
+
125
+ # v2 requires sign prefix
126
+ result = classifier("[debit] STARBUCKS #1234 SAN FRANCISCO CA")
127
+ print(result) # [{'label': 'Restaurants', 'score': 0.98}]
128
+
129
+ # Sign matters for ambiguous transactions
130
+ classifier("[credit] VENMO CASHOUT PPD ID: 12345678")
131
+ # [{'label': 'Income', 'score': 0.95}]
132
+
133
+ classifier("[debit] VENMO PAYMENT TO JOHN SMITH")
134
+ # [{'label': 'Transfer', 'score': 0.97}]
135
+ ```
136
+
137
+ ### JavaScript (Transformers.js)
138
+
139
+ ```javascript
140
+ const { pipeline } = require('@xenova/transformers');
141
+
142
+ const classifier = await pipeline(
143
+ 'text-classification',
144
+ 'DoDataThings/distilbert-us-transaction-classifier-v2'
145
+ );
146
+
147
+ const result = await classifier('[debit] STARBUCKS #1234');
148
+ // [{ label: 'Restaurants', score: 0.98 }]
149
+ ```
150
+
151
+ An ONNX export is included in the `onnx/` subdirectory.
152
+
153
+ ### Sign Prefix Convention
154
+
155
+ Prepend `[credit]` or `[debit]` based on the **normalized** transaction amount (cardholder perspective):
156
+ - `[debit]` — money left the account (purchases, payments out, fees)
157
+ - `[credit]` — money entered the account (income, refunds, payments received)
158
+
159
+ If your data uses issuer perspective (e.g., Apple Card where purchases are positive), normalize the sign first, then apply the prefix.
160
+
161
+ ## Training Data
162
+
163
+ The synthetic dataset is published at [`DoDataThings/us-bank-transaction-categories-v2`](https://huggingface.co/datasets/DoDataThings/us-bank-transaction-categories-v2). The generator script is open source — you can extend the merchant pools, add format templates, or increase sample counts.
164
+
165
+ ## Limitations
166
+
167
+ - **US bank formats only** — Trained on Chase, Apple Card, PayPal, Capital One, and US Bank statement patterns
168
+ - **Synthetic training data** — May miss patterns from banks not represented
169
+ - **Shopping is the weakest category** (~93%) due to overlap with Subscription and Groceries
170
+ - **Niche merchants** may classify with low confidence — use merchant rules for known edge cases
171
+ - **Sign prefix required** — The model expects `[debit]` or `[credit]` prefix. Passing raw descriptions without the prefix will degrade accuracy.
172
+ - **Not a standalone solution** — Best results come from combining with merchant rules and account-type-implied classifications
173
+
174
+ ## License
175
+
176
+ Apache 2.0
config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "data/models/foliome-classifier-v2\\",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForSequenceClassification"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "Education",
13
+ "1": "Entertainment",
14
+ "2": "Fees",
15
+ "3": "Groceries",
16
+ "4": "Healthcare",
17
+ "5": "Income",
18
+ "6": "Insurance",
19
+ "7": "Personal Care",
20
+ "8": "Rent",
21
+ "9": "Restaurants",
22
+ "10": "Shopping",
23
+ "11": "Subscription",
24
+ "12": "Transfer",
25
+ "13": "Transportation",
26
+ "14": "Travel",
27
+ "15": "Utilities"
28
+ },
29
+ "initializer_range": 0.02,
30
+ "label2id": {
31
+ "Education": 0,
32
+ "Entertainment": 1,
33
+ "Fees": 2,
34
+ "Groceries": 3,
35
+ "Healthcare": 4,
36
+ "Income": 5,
37
+ "Insurance": 6,
38
+ "Personal Care": 7,
39
+ "Rent": 8,
40
+ "Restaurants": 9,
41
+ "Shopping": 10,
42
+ "Subscription": 11,
43
+ "Transfer": 12,
44
+ "Transportation": 13,
45
+ "Travel": 14,
46
+ "Utilities": 15
47
+ },
48
+ "max_position_embeddings": 512,
49
+ "model_type": "distilbert",
50
+ "n_heads": 12,
51
+ "n_layers": 6,
52
+ "pad_token_id": 0,
53
+ "problem_type": "single_label_classification",
54
+ "qa_dropout": 0.1,
55
+ "seq_classif_dropout": 0.2,
56
+ "sinusoidal_pos_embds": false,
57
+ "tie_weights_": true,
58
+ "torch_dtype": "float32",
59
+ "transformers_version": "4.49.0",
60
+ "vocab_size": 30522
61
+ }
label_mapping.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "id2label": {
3
+ "0": "Education",
4
+ "1": "Entertainment",
5
+ "2": "Fees",
6
+ "3": "Groceries",
7
+ "4": "Healthcare",
8
+ "5": "Income",
9
+ "6": "Insurance",
10
+ "7": "Personal Care",
11
+ "8": "Rent",
12
+ "9": "Restaurants",
13
+ "10": "Shopping",
14
+ "11": "Subscription",
15
+ "12": "Transfer",
16
+ "13": "Transportation",
17
+ "14": "Travel",
18
+ "15": "Utilities"
19
+ },
20
+ "label2id": {
21
+ "Education": 0,
22
+ "Entertainment": 1,
23
+ "Fees": 2,
24
+ "Groceries": 3,
25
+ "Healthcare": 4,
26
+ "Income": 5,
27
+ "Insurance": 6,
28
+ "Personal Care": 7,
29
+ "Rent": 8,
30
+ "Restaurants": 9,
31
+ "Shopping": 10,
32
+ "Subscription": 11,
33
+ "Transfer": 12,
34
+ "Transportation": 13,
35
+ "Travel": 14,
36
+ "Utilities": 15
37
+ }
38
+ }
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af4d35409501558e9112dbc5aef014f0c8086427d00b4950d629f497d20d54fd
3
+ size 267975237
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
training-report.json ADDED
@@ -0,0 +1,673 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "distilbert-base-uncased + LoRA (r=32, alpha=64)",
3
+ "dataset": "24000 synthetic transactions, 16 categories, 1500/category",
4
+ "split": "20400 train / 3600 val (85/15)",
5
+ "epochs": 20,
6
+ "best_epoch": 18,
7
+ "best_val_accuracy": 0.9908,
8
+ "total_training_time_s": 458.7,
9
+ "device": "cuda",
10
+ "trainable_params": 1782544,
11
+ "total_params": 68748320,
12
+ "categories": [
13
+ "Education",
14
+ "Entertainment",
15
+ "Fees",
16
+ "Groceries",
17
+ "Healthcare",
18
+ "Income",
19
+ "Insurance",
20
+ "Personal Care",
21
+ "Rent",
22
+ "Restaurants",
23
+ "Shopping",
24
+ "Subscription",
25
+ "Transfer",
26
+ "Transportation",
27
+ "Travel",
28
+ "Utilities"
29
+ ],
30
+ "history": [
31
+ {
32
+ "epoch": 1,
33
+ "train_loss": 2.5896,
34
+ "val_loss": 1.9356,
35
+ "train_acc": 0.2075,
36
+ "val_acc": 0.5256,
37
+ "per_category": {
38
+ "Education": 0.362,
39
+ "Entertainment": 0.142,
40
+ "Fees": 1.0,
41
+ "Groceries": 0.575,
42
+ "Healthcare": 0.798,
43
+ "Income": 0.964,
44
+ "Insurance": 0.273,
45
+ "Personal Care": 0.203,
46
+ "Rent": 0.97,
47
+ "Restaurants": 0.544,
48
+ "Shopping": 0.36,
49
+ "Subscription": 0.802,
50
+ "Transfer": 0.166,
51
+ "Transportation": 0.5,
52
+ "Travel": 0.241,
53
+ "Utilities": 0.479
54
+ },
55
+ "epoch_time_s": 23.7
56
+ },
57
+ {
58
+ "epoch": 2,
59
+ "train_loss": 1.3952,
60
+ "val_loss": 0.8527,
61
+ "train_acc": 0.6386,
62
+ "val_acc": 0.7642,
63
+ "per_category": {
64
+ "Education": 0.819,
65
+ "Entertainment": 0.652,
66
+ "Fees": 1.0,
67
+ "Groceries": 0.617,
68
+ "Healthcare": 0.88,
69
+ "Income": 1.0,
70
+ "Insurance": 0.745,
71
+ "Personal Care": 0.586,
72
+ "Rent": 0.991,
73
+ "Restaurants": 0.628,
74
+ "Shopping": 0.671,
75
+ "Subscription": 0.718,
76
+ "Transfer": 0.751,
77
+ "Transportation": 0.71,
78
+ "Travel": 0.565,
79
+ "Utilities": 0.853
80
+ },
81
+ "epoch_time_s": 23.2
82
+ },
83
+ {
84
+ "epoch": 3,
85
+ "train_loss": 0.7496,
86
+ "val_loss": 0.4969,
87
+ "train_acc": 0.7876,
88
+ "val_acc": 0.8619,
89
+ "per_category": {
90
+ "Education": 0.95,
91
+ "Entertainment": 0.826,
92
+ "Fees": 1.0,
93
+ "Groceries": 0.696,
94
+ "Healthcare": 0.93,
95
+ "Income": 0.982,
96
+ "Insurance": 0.892,
97
+ "Personal Care": 0.918,
98
+ "Rent": 1.0,
99
+ "Restaurants": 0.686,
100
+ "Shopping": 0.778,
101
+ "Subscription": 0.861,
102
+ "Transfer": 0.889,
103
+ "Transportation": 0.694,
104
+ "Travel": 0.728,
105
+ "Utilities": 0.943
106
+ },
107
+ "epoch_time_s": 23.5
108
+ },
109
+ {
110
+ "epoch": 4,
111
+ "train_loss": 0.4773,
112
+ "val_loss": 0.2947,
113
+ "train_acc": 0.8612,
114
+ "val_acc": 0.9169,
115
+ "per_category": {
116
+ "Education": 0.977,
117
+ "Entertainment": 0.939,
118
+ "Fees": 1.0,
119
+ "Groceries": 0.734,
120
+ "Healthcare": 0.946,
121
+ "Income": 1.0,
122
+ "Insurance": 0.944,
123
+ "Personal Care": 0.991,
124
+ "Rent": 1.0,
125
+ "Restaurants": 0.863,
126
+ "Shopping": 0.796,
127
+ "Subscription": 0.842,
128
+ "Transfer": 0.926,
129
+ "Transportation": 0.863,
130
+ "Travel": 0.869,
131
+ "Utilities": 0.953
132
+ },
133
+ "epoch_time_s": 23.4
134
+ },
135
+ {
136
+ "epoch": 5,
137
+ "train_loss": 0.325,
138
+ "val_loss": 0.2142,
139
+ "train_acc": 0.9049,
140
+ "val_acc": 0.9392,
141
+ "per_category": {
142
+ "Education": 0.982,
143
+ "Entertainment": 0.951,
144
+ "Fees": 1.0,
145
+ "Groceries": 0.883,
146
+ "Healthcare": 0.967,
147
+ "Income": 1.0,
148
+ "Insurance": 0.97,
149
+ "Personal Care": 1.0,
150
+ "Rent": 1.0,
151
+ "Restaurants": 0.881,
152
+ "Shopping": 0.804,
153
+ "Subscription": 0.866,
154
+ "Transfer": 0.917,
155
+ "Transportation": 0.891,
156
+ "Travel": 0.921,
157
+ "Utilities": 0.976
158
+ },
159
+ "epoch_time_s": 23.4
160
+ },
161
+ {
162
+ "epoch": 6,
163
+ "train_loss": 0.2342,
164
+ "val_loss": 0.1447,
165
+ "train_acc": 0.9325,
166
+ "val_acc": 0.9575,
167
+ "per_category": {
168
+ "Education": 0.982,
169
+ "Entertainment": 0.988,
170
+ "Fees": 1.0,
171
+ "Groceries": 0.883,
172
+ "Healthcare": 0.967,
173
+ "Income": 1.0,
174
+ "Insurance": 0.939,
175
+ "Personal Care": 1.0,
176
+ "Rent": 1.0,
177
+ "Restaurants": 0.947,
178
+ "Shopping": 0.836,
179
+ "Subscription": 0.941,
180
+ "Transfer": 0.982,
181
+ "Transportation": 0.907,
182
+ "Travel": 0.948,
183
+ "Utilities": 0.995
184
+ },
185
+ "epoch_time_s": 23.4
186
+ },
187
+ {
188
+ "epoch": 7,
189
+ "train_loss": 0.1724,
190
+ "val_loss": 0.1079,
191
+ "train_acc": 0.9508,
192
+ "val_acc": 0.9708,
193
+ "per_category": {
194
+ "Education": 0.982,
195
+ "Entertainment": 0.996,
196
+ "Fees": 1.0,
197
+ "Groceries": 0.953,
198
+ "Healthcare": 0.967,
199
+ "Income": 1.0,
200
+ "Insurance": 0.991,
201
+ "Personal Care": 1.0,
202
+ "Rent": 1.0,
203
+ "Restaurants": 0.942,
204
+ "Shopping": 0.844,
205
+ "Subscription": 0.926,
206
+ "Transfer": 0.982,
207
+ "Transportation": 0.972,
208
+ "Travel": 0.979,
209
+ "Utilities": 0.991
210
+ },
211
+ "epoch_time_s": 23.5
212
+ },
213
+ {
214
+ "epoch": 8,
215
+ "train_loss": 0.1254,
216
+ "val_loss": 0.0775,
217
+ "train_acc": 0.9649,
218
+ "val_acc": 0.9789,
219
+ "per_category": {
220
+ "Education": 1.0,
221
+ "Entertainment": 0.988,
222
+ "Fees": 1.0,
223
+ "Groceries": 0.967,
224
+ "Healthcare": 0.967,
225
+ "Income": 1.0,
226
+ "Insurance": 0.996,
227
+ "Personal Care": 1.0,
228
+ "Rent": 1.0,
229
+ "Restaurants": 0.96,
230
+ "Shopping": 0.893,
231
+ "Subscription": 0.95,
232
+ "Transfer": 0.982,
233
+ "Transportation": 0.976,
234
+ "Travel": 0.995,
235
+ "Utilities": 0.986
236
+ },
237
+ "epoch_time_s": 23.5
238
+ },
239
+ {
240
+ "epoch": 9,
241
+ "train_loss": 0.0972,
242
+ "val_loss": 0.0627,
243
+ "train_acc": 0.9723,
244
+ "val_acc": 0.9814,
245
+ "per_category": {
246
+ "Education": 0.995,
247
+ "Entertainment": 0.992,
248
+ "Fees": 1.0,
249
+ "Groceries": 0.981,
250
+ "Healthcare": 0.967,
251
+ "Income": 1.0,
252
+ "Insurance": 0.996,
253
+ "Personal Care": 1.0,
254
+ "Rent": 1.0,
255
+ "Restaurants": 0.96,
256
+ "Shopping": 0.884,
257
+ "Subscription": 0.96,
258
+ "Transfer": 0.982,
259
+ "Transportation": 0.988,
260
+ "Travel": 0.995,
261
+ "Utilities": 1.0
262
+ },
263
+ "epoch_time_s": 22.4
264
+ },
265
+ {
266
+ "epoch": 10,
267
+ "train_loss": 0.0771,
268
+ "val_loss": 0.0557,
269
+ "train_acc": 0.9789,
270
+ "val_acc": 0.985,
271
+ "per_category": {
272
+ "Education": 1.0,
273
+ "Entertainment": 1.0,
274
+ "Fees": 1.0,
275
+ "Groceries": 1.0,
276
+ "Healthcare": 0.963,
277
+ "Income": 1.0,
278
+ "Insurance": 0.996,
279
+ "Personal Care": 1.0,
280
+ "Rent": 1.0,
281
+ "Restaurants": 0.978,
282
+ "Shopping": 0.889,
283
+ "Subscription": 0.955,
284
+ "Transfer": 0.995,
285
+ "Transportation": 0.988,
286
+ "Travel": 0.995,
287
+ "Utilities": 1.0
288
+ },
289
+ "epoch_time_s": 22.6
290
+ },
291
+ {
292
+ "epoch": 11,
293
+ "train_loss": 0.0665,
294
+ "val_loss": 0.0485,
295
+ "train_acc": 0.9812,
296
+ "val_acc": 0.9864,
297
+ "per_category": {
298
+ "Education": 1.0,
299
+ "Entertainment": 1.0,
300
+ "Fees": 1.0,
301
+ "Groceries": 0.991,
302
+ "Healthcare": 0.967,
303
+ "Income": 1.0,
304
+ "Insurance": 0.996,
305
+ "Personal Care": 1.0,
306
+ "Rent": 1.0,
307
+ "Restaurants": 0.987,
308
+ "Shopping": 0.898,
309
+ "Subscription": 0.975,
310
+ "Transfer": 0.982,
311
+ "Transportation": 0.988,
312
+ "Travel": 1.0,
313
+ "Utilities": 1.0
314
+ },
315
+ "epoch_time_s": 22.6
316
+ },
317
+ {
318
+ "epoch": 12,
319
+ "train_loss": 0.0534,
320
+ "val_loss": 0.0404,
321
+ "train_acc": 0.9841,
322
+ "val_acc": 0.9872,
323
+ "per_category": {
324
+ "Education": 1.0,
325
+ "Entertainment": 1.0,
326
+ "Fees": 1.0,
327
+ "Groceries": 0.986,
328
+ "Healthcare": 0.963,
329
+ "Income": 1.0,
330
+ "Insurance": 0.996,
331
+ "Personal Care": 0.991,
332
+ "Rent": 1.0,
333
+ "Restaurants": 0.991,
334
+ "Shopping": 0.907,
335
+ "Subscription": 0.98,
336
+ "Transfer": 0.995,
337
+ "Transportation": 0.988,
338
+ "Travel": 1.0,
339
+ "Utilities": 1.0
340
+ },
341
+ "epoch_time_s": 22.6
342
+ },
343
+ {
344
+ "epoch": 13,
345
+ "train_loss": 0.0463,
346
+ "val_loss": 0.0418,
347
+ "train_acc": 0.9857,
348
+ "val_acc": 0.9889,
349
+ "per_category": {
350
+ "Education": 1.0,
351
+ "Entertainment": 1.0,
352
+ "Fees": 1.0,
353
+ "Groceries": 1.0,
354
+ "Healthcare": 0.967,
355
+ "Income": 1.0,
356
+ "Insurance": 0.996,
357
+ "Personal Care": 1.0,
358
+ "Rent": 1.0,
359
+ "Restaurants": 0.991,
360
+ "Shopping": 0.898,
361
+ "Subscription": 0.975,
362
+ "Transfer": 0.995,
363
+ "Transportation": 1.0,
364
+ "Travel": 1.0,
365
+ "Utilities": 1.0
366
+ },
367
+ "epoch_time_s": 22.3
368
+ },
369
+ {
370
+ "epoch": 14,
371
+ "train_loss": 0.0421,
372
+ "val_loss": 0.0386,
373
+ "train_acc": 0.9872,
374
+ "val_acc": 0.9889,
375
+ "per_category": {
376
+ "Education": 1.0,
377
+ "Entertainment": 1.0,
378
+ "Fees": 1.0,
379
+ "Groceries": 0.995,
380
+ "Healthcare": 0.967,
381
+ "Income": 1.0,
382
+ "Insurance": 1.0,
383
+ "Personal Care": 1.0,
384
+ "Rent": 1.0,
385
+ "Restaurants": 0.991,
386
+ "Shopping": 0.911,
387
+ "Subscription": 0.98,
388
+ "Transfer": 0.991,
389
+ "Transportation": 0.988,
390
+ "Travel": 1.0,
391
+ "Utilities": 1.0
392
+ },
393
+ "epoch_time_s": 22.3
394
+ },
395
+ {
396
+ "epoch": 15,
397
+ "train_loss": 0.0378,
398
+ "val_loss": 0.0341,
399
+ "train_acc": 0.9886,
400
+ "val_acc": 0.9892,
401
+ "per_category": {
402
+ "Education": 1.0,
403
+ "Entertainment": 1.0,
404
+ "Fees": 1.0,
405
+ "Groceries": 0.995,
406
+ "Healthcare": 0.971,
407
+ "Income": 1.0,
408
+ "Insurance": 1.0,
409
+ "Personal Care": 1.0,
410
+ "Rent": 1.0,
411
+ "Restaurants": 0.991,
412
+ "Shopping": 0.907,
413
+ "Subscription": 0.98,
414
+ "Transfer": 0.995,
415
+ "Transportation": 0.988,
416
+ "Travel": 1.0,
417
+ "Utilities": 1.0
418
+ },
419
+ "epoch_time_s": 22.4
420
+ },
421
+ {
422
+ "epoch": 16,
423
+ "train_loss": 0.0319,
424
+ "val_loss": 0.0363,
425
+ "train_acc": 0.9912,
426
+ "val_acc": 0.9894,
427
+ "per_category": {
428
+ "Education": 1.0,
429
+ "Entertainment": 1.0,
430
+ "Fees": 1.0,
431
+ "Groceries": 1.0,
432
+ "Healthcare": 0.967,
433
+ "Income": 1.0,
434
+ "Insurance": 1.0,
435
+ "Personal Care": 1.0,
436
+ "Rent": 1.0,
437
+ "Restaurants": 0.991,
438
+ "Shopping": 0.902,
439
+ "Subscription": 0.985,
440
+ "Transfer": 0.995,
441
+ "Transportation": 0.992,
442
+ "Travel": 1.0,
443
+ "Utilities": 1.0
444
+ },
445
+ "epoch_time_s": 22.4
446
+ },
447
+ {
448
+ "epoch": 17,
449
+ "train_loss": 0.0288,
450
+ "val_loss": 0.0296,
451
+ "train_acc": 0.9913,
452
+ "val_acc": 0.9906,
453
+ "per_category": {
454
+ "Education": 1.0,
455
+ "Entertainment": 1.0,
456
+ "Fees": 1.0,
457
+ "Groceries": 0.986,
458
+ "Healthcare": 0.971,
459
+ "Income": 1.0,
460
+ "Insurance": 0.996,
461
+ "Personal Care": 0.991,
462
+ "Rent": 1.0,
463
+ "Restaurants": 0.991,
464
+ "Shopping": 0.947,
465
+ "Subscription": 0.985,
466
+ "Transfer": 0.995,
467
+ "Transportation": 0.988,
468
+ "Travel": 1.0,
469
+ "Utilities": 1.0
470
+ },
471
+ "epoch_time_s": 22.3
472
+ },
473
+ {
474
+ "epoch": 18,
475
+ "train_loss": 0.0255,
476
+ "val_loss": 0.0284,
477
+ "train_acc": 0.993,
478
+ "val_acc": 0.9908,
479
+ "per_category": {
480
+ "Education": 1.0,
481
+ "Entertainment": 1.0,
482
+ "Fees": 1.0,
483
+ "Groceries": 0.995,
484
+ "Healthcare": 0.971,
485
+ "Income": 1.0,
486
+ "Insurance": 0.996,
487
+ "Personal Care": 1.0,
488
+ "Rent": 1.0,
489
+ "Restaurants": 0.991,
490
+ "Shopping": 0.929,
491
+ "Subscription": 0.985,
492
+ "Transfer": 0.995,
493
+ "Transportation": 0.992,
494
+ "Travel": 1.0,
495
+ "Utilities": 1.0
496
+ },
497
+ "epoch_time_s": 22.4
498
+ },
499
+ {
500
+ "epoch": 19,
501
+ "train_loss": 0.0273,
502
+ "val_loss": 0.0306,
503
+ "train_acc": 0.9912,
504
+ "val_acc": 0.9897,
505
+ "per_category": {
506
+ "Education": 1.0,
507
+ "Entertainment": 1.0,
508
+ "Fees": 1.0,
509
+ "Groceries": 1.0,
510
+ "Healthcare": 0.967,
511
+ "Income": 1.0,
512
+ "Insurance": 0.996,
513
+ "Personal Care": 1.0,
514
+ "Rent": 1.0,
515
+ "Restaurants": 0.991,
516
+ "Shopping": 0.907,
517
+ "Subscription": 0.98,
518
+ "Transfer": 1.0,
519
+ "Transportation": 0.996,
520
+ "Travel": 1.0,
521
+ "Utilities": 1.0
522
+ },
523
+ "epoch_time_s": 22.5
524
+ },
525
+ {
526
+ "epoch": 20,
527
+ "train_loss": 0.023,
528
+ "val_loss": 0.03,
529
+ "train_acc": 0.9928,
530
+ "val_acc": 0.9906,
531
+ "per_category": {
532
+ "Education": 1.0,
533
+ "Entertainment": 1.0,
534
+ "Fees": 1.0,
535
+ "Groceries": 1.0,
536
+ "Healthcare": 0.967,
537
+ "Income": 1.0,
538
+ "Insurance": 0.996,
539
+ "Personal Care": 1.0,
540
+ "Rent": 1.0,
541
+ "Restaurants": 0.991,
542
+ "Shopping": 0.916,
543
+ "Subscription": 0.985,
544
+ "Transfer": 1.0,
545
+ "Transportation": 0.996,
546
+ "Travel": 1.0,
547
+ "Utilities": 1.0
548
+ },
549
+ "epoch_time_s": 23.2
550
+ }
551
+ ],
552
+ "final_per_category": {
553
+ "Education": {
554
+ "accuracy": 1.0,
555
+ "correct": 221,
556
+ "total": 221,
557
+ "top_confusions": {}
558
+ },
559
+ "Entertainment": {
560
+ "accuracy": 1.0,
561
+ "correct": 247,
562
+ "total": 247,
563
+ "top_confusions": {}
564
+ },
565
+ "Fees": {
566
+ "accuracy": 1.0,
567
+ "correct": 240,
568
+ "total": 240,
569
+ "top_confusions": {}
570
+ },
571
+ "Groceries": {
572
+ "accuracy": 0.995,
573
+ "correct": 213,
574
+ "total": 214,
575
+ "top_confusions": {
576
+ "Shopping": 1
577
+ }
578
+ },
579
+ "Healthcare": {
580
+ "accuracy": 0.971,
581
+ "correct": 235,
582
+ "total": 242,
583
+ "top_confusions": {
584
+ "Education": 2,
585
+ "Utilities": 2,
586
+ "Insurance": 1
587
+ }
588
+ },
589
+ "Income": {
590
+ "accuracy": 1.0,
591
+ "correct": 221,
592
+ "total": 221,
593
+ "top_confusions": {}
594
+ },
595
+ "Insurance": {
596
+ "accuracy": 0.996,
597
+ "correct": 230,
598
+ "total": 231,
599
+ "top_confusions": {
600
+ "Income": 1
601
+ }
602
+ },
603
+ "Personal Care": {
604
+ "accuracy": 1.0,
605
+ "correct": 232,
606
+ "total": 232,
607
+ "top_confusions": {}
608
+ },
609
+ "Rent": {
610
+ "accuracy": 1.0,
611
+ "correct": 232,
612
+ "total": 232,
613
+ "top_confusions": {}
614
+ },
615
+ "Restaurants": {
616
+ "accuracy": 0.991,
617
+ "correct": 224,
618
+ "total": 226,
619
+ "top_confusions": {
620
+ "Groceries": 2
621
+ }
622
+ },
623
+ "Shopping": {
624
+ "accuracy": 0.929,
625
+ "correct": 209,
626
+ "total": 225,
627
+ "top_confusions": {
628
+ "Personal Care": 4,
629
+ "Restaurants": 3,
630
+ "Travel": 3
631
+ }
632
+ },
633
+ "Subscription": {
634
+ "accuracy": 0.985,
635
+ "correct": 199,
636
+ "total": 202,
637
+ "top_confusions": {
638
+ "Education": 1,
639
+ "Personal Care": 1,
640
+ "Shopping": 1
641
+ }
642
+ },
643
+ "Transfer": {
644
+ "accuracy": 0.995,
645
+ "correct": 216,
646
+ "total": 217,
647
+ "top_confusions": {
648
+ "Shopping": 1
649
+ }
650
+ },
651
+ "Transportation": {
652
+ "accuracy": 0.992,
653
+ "correct": 246,
654
+ "total": 248,
655
+ "top_confusions": {
656
+ "Shopping": 1,
657
+ "Subscription": 1
658
+ }
659
+ },
660
+ "Travel": {
661
+ "accuracy": 1.0,
662
+ "correct": 191,
663
+ "total": 191,
664
+ "top_confusions": {}
665
+ },
666
+ "Utilities": {
667
+ "accuracy": 1.0,
668
+ "correct": 211,
669
+ "total": 211,
670
+ "top_confusions": {}
671
+ }
672
+ }
673
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff