--- language: en tags: - transaction-categorization - distilbert - lora - peft - finance - text-classification datasets: - mitulshah/transaction-categorization license: apache-2.0 --- # Transaction Category Classifier - LoRA Version This is a **LoRA adapter** for DistilBERT that classifies bank transactions into 10 categories with **98.53% accuracy**. ## Model Details - **Base Model:** [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) - **Fine-tuned Model:** [finmigodeveloper/distilbert-transaction-classifier](https://huggingface.co/finmigodeveloper/distilbert-transaction-classifier) - **Adapter Size:** ~2.5 MB (98.7% smaller than full model) - **Categories:** 10 transaction types ## Performance | Metric | Value | |--------|-------| | Accuracy | 98.53% | | Loss | 0.0221 | | Training Samples | 80,000 | | Validation Samples | 20,000 | ## Categories - Charity & Donations - Entertainment & Recreation - Financial Services - Food & Dining - Government & Legal - Healthcare & Medical - Income - Shopping & Retail - Transportation - Utilities & Services ## How to Use ```python from transformers import pipeline # Load directly classifier = pipeline("text-classification", model="finmigodeveloper/distilbert-transaction-classifier-lora") # Test it transactions = [ "Starbucks coffee", "Monthly salary deposit", "Uber ride to airport" ] for text in transactions: result = classifier(text)[0] print(f"{text}: {result['label']} ({result['score']:.2%})") ``` ## Training Details - **LoRA Rank (r):** 8 - **LoRA Alpha:** 16 - **Target Modules:** q_lin, k_lin, v_lin, out_lin - **Dropout:** 0.1 - **Epochs:** 3 - **Batch Size:** 64 - **Learning Rate:** 2e-5 ## Why LoRA? - **98.7% smaller** than the full model - **Faster loading** (~0.3 seconds vs 2-3 seconds) - **Same accuracy** as the full model - Perfect for **mobile apps** and **edge deployment** ## Files in this repository - `adapter_model.safetensors`: The LoRA adapter weights (2.5 MB) - `adapter_config.json`: LoRA configuration - `training_stats.json`: Detailed training statistics - `tokenizer.json` & `tokenizer_config.json`: Tokenizer files