Model Card for Bank-Transaction-NER-DistilBERT
This model performs token-level Named Entity Recognition (NER) on bank transaction SMS and email messages, identifying entities such as AMOUNT, DATE, TIME, MERCHANT, ACCOUNT, and REFERENCE IDs.
Model Details
Model Description
This is a DistilBERT-based token classification model fine-tuned for extracting structured information from bank transaction messages.
The model identifies entities such as transaction amounts, dates, times, merchant names, account references, and balances from unstructured text.
- Developed by: Abhijit Das
- Model type: Token Classification (Named Entity Recognition)
- Language(s): English
- License: MIT
- Finetuned from: distilbert/distilbert-base-cased
Uses
Direct Use
The model can be used to:
- Extract entities from bank transaction SMS
- Parse financial notification emails
- Support expense tracking and personal finance applications
- Generate structured data for downstream analytics
Label Schema
The model predicts the following BIO-formatted labels:
- B-amount / I-amount
- B-date / I-date
- B-time / I-time
- B-merchant / I-merchant
- B-balance / I-balance
- B-account / I-account
- B-ref / I-ref
- O (Outside entity)
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_name = "abhijitnumber1/bert-transaction-token-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
ner = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer
)
text = "INR 11025.97 debited from your account at Uber on 31.07.2020"
output = ner(text)
print(output)
Training Details
Training Data
This model was trained on semi synthetic bank transaction messages written in English. The data includes:
Automatically generated bank SMS and email messages (Data are randomly generated based on some real sample transaction message)
Different transaction types like debit, credit, refund, and balance update
Messages formatted similar to Indian bank notifications
Each Message is dynamically labled.
Training Procedure
The model is based on DistilBERT and was trained to label each word in a sentence (Named Entity Recognition).
Preprocessing [optional]
Before training:
Text was split into tokens using the DistilBERT tokenizer
Labels were matched correctly to each token
Special tokens like [CLS] and [SEP] were ignored during training
Padding tokens were excluded from loss calculation
Labels follow the format (Beginning, Inside, Outside)
Speeds, Sizes, Times [optional]
Training time: Around 15 minutes on one CPU
Model size: About 261 MB
- Downloads last month
- 1
Model tree for abhijitnumber1/bert-transaction-token-classifier
Base model
distilbert/distilbert-base-cased