💳 Expense Tracker — DistilBERT LoRA v2 (4 Data Sources)

📦 Training Data

Source	Type	Rows
engreemali/bank-transactions-sms-datasetss	Real Indian SMS (cleaned)	~1,200
kumarperiya/pan-indian-consumer-transaction-dataset	Structured → synthetic SMS	~600
ChatGPT synthetic_sms_5000 (fixed)	Synthetic (augmented)	~3,300
ChatGPT realistic_synthetic_sms (fixed)	Synthetic (realistic)	~3,200

🏷️ Categories

ID	Category
0	Education
1	Entertainment
2	Food
3	Healthcare
4	Shopping
5	Transport
6	Utilities

🚀 Usage

from transformers import pipeline
clf = pipeline('text-classification', model='udayugale/expense-tracker-distilbert-lora-v2')
print(clf('Netmeds medicine order rs 350 confirmed. Delivery in 2 hrs'))
# [{'label': 'Healthcare', 'score': 0.95}]

🔧 Fixes Applied to ChatGPT Data

Dropped Income and Others labels (not in expense categories)
Mapped Bills → Utilities
Dropped sender column from File 2 (2,376 sender-label mismatches)
Augmented short texts (< 7 words) with bank SMS context wrappers

Downloads last month: -

Safetensors

Model size

67M params

Tensor type

F32

Model tree for udayugale/expense-tracker-distilbert-lora-v2

Base model

distilbert/distilbert-base-uncased

Adapter

(379)

this model