Model Card for Bank-Transaction-NER-DistilBERT

This model performs token-level Named Entity Recognition (NER) on bank transaction SMS and email messages, identifying entities such as AMOUNT, DATE, TIME, MERCHANT, ACCOUNT, and REFERENCE IDs.

Model Details

Model Description

This is a DistilBERT-based token classification model fine-tuned for extracting structured information from bank transaction messages.
The model identifies entities such as transaction amounts, dates, times, merchant names, account references, and balances from unstructured text.

Developed by: Abhijit Das
Model type: Token Classification (Named Entity Recognition)
Language(s): English
License: MIT
Finetuned from: distilbert/distilbert-base-cased

Uses

Direct Use

The model can be used to:

Extract entities from bank transaction SMS
Parse financial notification emails
Support expense tracking and personal finance applications
Generate structured data for downstream analytics

Label Schema

The model predicts the following BIO-formatted labels:

B-amount / I-amount
B-date / I-date
B-time / I-time
B-merchant / I-merchant
B-balance / I-balance
B-account / I-account
B-ref / I-ref
O (Outside entity)

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = "abhijitnumber1/bert-transaction-token-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer
)

text = "INR 11025.97 debited from your account at Uber on 31.07.2020"
output = ner(text)
print(output)

Training Details

Training Data

This model was trained on semi synthetic bank transaction messages written in English. The data includes:

Automatically generated bank SMS and email messages (Data are randomly generated based on some real sample transaction message)

Different transaction types like debit, credit, refund, and balance update

Messages formatted similar to Indian bank notifications

Each Message is dynamically labled.

Training Procedure

The model is based on DistilBERT and was trained to label each word in a sentence (Named Entity Recognition).

Preprocessing [optional]

Before training:

Text was split into tokens using the DistilBERT tokenizer

Labels were matched correctly to each token

Special tokens like [CLS] and [SEP] were ignored during training

Padding tokens were excluded from loss calculation

Labels follow the format (Beginning, Inside, Outside)

Speeds, Sizes, Times [optional]

Training time: Around 15 minutes on one CPU

Model size: About 261 MB

Downloads last month: 1

Safetensors

Model size

65.2M params

Tensor type

F32

Model tree for abhijitnumber1/bert-transaction-token-classifier

Base model

distilbert/distilbert-base-cased

Finetuned

(347)

this model