Upload README.md with huggingface_hub

b7f5c51 verified 1 day ago

6.02 kB

	---
	license: cc-by-4.0
	language:
	- en
	tags:
	- onnx
	- ner
	- transaction-extraction
	- sms-parsing
	- gliner2
	- deberta
	- on-device
	- mobile
	library_name: onnxruntime
	pipeline_tag: token-classification
	---

	# Model Card: fintext-extractor

	GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend.

	## Architecture

	fintext-extractor uses a two-stage pipeline to maximize both speed and accuracy:

	1. Stage 1 -- Classification: A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (`is_transaction: yes/no`). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low.

	2. Stage 2 -- Extraction: A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits.

	This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams.

	## Extracted Fields

	\| Field \| Type \| Description \|
	\|-------\|------\|-------------\|
	\| `is_transaction` \| bool \| Whether the message is a completed transaction \|
	\| `transaction_amount` \| float \| Numeric amount (e.g., 5000.00) \|
	\| `transaction_type` \| str \| DEBIT or CREDIT \|
	\| `transaction_date` \| str \| Date in DD-MM-YYYY format \|
	\| `transaction_description` \| str \| Merchant or person name \|
	\| `masked_account_digits` \| str \| Last 4 digits of card/account \|

	## Model Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `onnx/deberta_classifier_fp16.onnx` + `.data` \| ~830 MB \| Classification model (FP16) \|
	\| `onnx/deberta_classifier_fp32.onnx` + `.data` \| ~1.66 GB \| Classification model (FP32) \|
	\| `onnx/extraction_full_fp16.onnx` + `.data` \| ~930 MB \| Extraction model (FP16) \|
	\| `onnx/extraction_full_fp32.onnx` + `.data` \| ~1.9 GB \| Extraction model (FP32) \|
	\| `tokenizer/` \| ~11 MB \| Classification tokenizer \|
	\| `tokenizer_extraction/` \| ~11 MB \| Extraction tokenizer \|

	FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision.

	## Quick Start (Python)

	```python
	from fintext import FintextExtractor

	extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor")
	result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26")
	print(result)
	# {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT',
	# 'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay',
	# 'masked_account_digits': '1234'}
	```

	## Direct ONNX Runtime Usage

	If you prefer not to install the `fintext` library, you can run the ONNX models directly:

	```python
	import numpy as np
	import onnxruntime as ort
	from tokenizers import Tokenizer

	# Load classification model and tokenizer
	cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx")
	tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json")

	# Tokenize input
	text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26"
	encoding = tokenizer.encode(text)
	input_ids = np.array([encoding.ids], dtype=np.int64)
	attention_mask = np.array([encoding.attention_mask], dtype=np.int64)

	# Run classification
	cls_output = cls_session.run(None, {
	"input_ids": input_ids,
	"attention_mask": attention_mask,
	})
	is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1

	# If classified as a transaction, run extraction
	if is_transaction:
	ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx")
	ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json")
	# ... tokenize and run extraction session
	```

	## Training

	The models were fine-tuned from the following base checkpoints:

	- Classifier: [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) with LoRA (r=16, alpha=32)
	- Extractor: [fastino/gliner2-large-v1](https://huggingface.co/fastino/gliner2-large-v1) with LoRA extraction adapter

	Training used the GLiNER2 multi-task schema, combining binary classification (`is_transaction`) with structured extraction (`transaction_info`) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs.

	## Metrics

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Classification accuracy \| 0.80 \|
	\| Amount extraction accuracy \| 1.00 \|
	\| Type extraction accuracy \| 1.00 \|
	\| Digits extraction accuracy \| 1.00 \|
	\| Avg latency (FP16, CPU) \| 47 ms \|

	Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session.

	## Limitations

	- Regional focus: Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated.
	- English only: The model supports English language messages only.
	- Span extraction, not generation: Field values must exist verbatim in the input text. The model extracts spans rather than generating new text.
	- Synthetic evaluation data: The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ.

	## Use Cases

	- Personal finance apps
	- Expense tracking and categorization
	- Transaction monitoring and alerting
	- Bank statement reconciliation from SMS/notifications

	## License

	This model is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.

	## Links

	- GitHub: [https://github.com/sowrabhmv/fintext-extractor](https://github.com/sowrabhmv/fintext-extractor)
	- Notebooks: See the GitHub repo for cookbook examples and training notebooks