bert-transaction-token-classifier / README.md

Update README.md

0232234 verified about 2 months ago

3.98 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- bert
	- nlp
	- ner
	base_model:
	- distilbert/distilbert-base-cased
	pipeline_tag: token-classification
	---

	# Model Card for Bank-Transaction-NER-DistilBERT
	This model performs token-level Named Entity Recognition (NER) on bank transaction SMS and email messages, identifying entities such as AMOUNT, DATE, TIME, MERCHANT, ACCOUNT, and REFERENCE IDs.

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description
	This is a DistilBERT-based token classification model fine-tuned for extracting structured information from bank transaction messages.
	The model identifies entities such as transaction amounts, dates, times, merchant names, account references, and balances from unstructured text.

	- Developed by: Abhijit Das
	- Model type: Token Classification (Named Entity Recognition)
	- Language(s): English
	- License: MIT
	- Finetuned from: distilbert/distilbert-base-cased


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use
	The model can be used to:
	- Extract entities from bank transaction SMS
	- Parse financial notification emails
	- Support expense tracking and personal finance applications
	- Generate structured data for downstream analytics


	## Label Schema

	The model predicts the following BIO-formatted labels:

	- B-amount / I-amount
	- B-date / I-date
	- B-time / I-time
	- B-merchant / I-merchant
	- B-balance / I-balance
	- B-account / I-account
	- B-ref / I-ref
	- O (Outside entity)


	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	model_name = "abhijitnumber1/bert-transaction-token-classifier"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	ner = pipeline(
	"token-classification",
	model=model,
	tokenizer=tokenizer
	)

	text = "INR 11025.97 debited from your account at Uber on 31.07.2020"
	output = ner(text)
	print(output)
	```



	## Training Details

	### Training Data
	This model was trained on semi synthetic bank transaction messages written in English.
	The data includes:

	Automatically generated bank SMS and email messages (Data are randomly generated based on some real sample transaction message)

	Different transaction types like debit, credit, refund, and balance update

	Messages formatted similar to Indian bank notifications

	Each Message is dynamically labled.

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->



	### Training Procedure
	The model is based on DistilBERT and was trained to label each word in a sentence (Named Entity Recognition).

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	Before training:

	Text was split into tokens using the DistilBERT tokenizer

	Labels were matched correctly to each token

	Special tokens like [CLS] and [SEP] were ignored during training

	Padding tokens were excluded from loss calculation

	Labels follow the format (Beginning, Inside, Outside)



	#### Speeds, Sizes, Times [optional]
	Training time: Around 15 minutes on one CPU

	Model size: About 261 MB

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->