license: cc-by-4.0
language:
- en
tags:
- onnx
- ner
- transaction-extraction
- sms-parsing
- gliner2
- deberta
- on-device
- mobile
library_name: onnxruntime
pipeline_tag: token-classification
Model Card: fintext-extractor
GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend.
Architecture
fintext-extractor uses a two-stage pipeline to maximize both speed and accuracy:
Stage 1 -- Classification: A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (
is_transaction: yes/no). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low.Stage 2 -- Extraction: A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits.
This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams.
Extracted Fields
| Field | Type | Description |
|---|---|---|
is_transaction |
bool | Whether the message is a completed transaction |
transaction_amount |
float | Numeric amount (e.g., 5000.00) |
transaction_type |
str | DEBIT or CREDIT |
transaction_date |
str | Date in DD-MM-YYYY format |
transaction_description |
str | Merchant or person name |
masked_account_digits |
str | Last 4 digits of card/account |
Model Files
| File | Size | Description |
|---|---|---|
onnx/deberta_classifier_fp16.onnx + .data |
~830 MB | Classification model (FP16) |
onnx/deberta_classifier_fp32.onnx + .data |
~1.66 GB | Classification model (FP32) |
onnx/extraction_full_fp16.onnx + .data |
~930 MB | Extraction model (FP16) |
onnx/extraction_full_fp32.onnx + .data |
~1.9 GB | Extraction model (FP32) |
tokenizer/ |
~11 MB | Classification tokenizer |
tokenizer_extraction/ |
~11 MB | Extraction tokenizer |
FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision.
Quick Start (Python)
from fintext import FintextExtractor
extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor")
result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26")
print(result)
# {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT',
# 'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay',
# 'masked_account_digits': '1234'}
Direct ONNX Runtime Usage
If you prefer not to install the fintext library, you can run the ONNX models directly:
import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer
# Load classification model and tokenizer
cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx")
tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json")
# Tokenize input
text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26"
encoding = tokenizer.encode(text)
input_ids = np.array([encoding.ids], dtype=np.int64)
attention_mask = np.array([encoding.attention_mask], dtype=np.int64)
# Run classification
cls_output = cls_session.run(None, {
"input_ids": input_ids,
"attention_mask": attention_mask,
})
is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1
# If classified as a transaction, run extraction
if is_transaction:
ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx")
ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json")
# ... tokenize and run extraction session
Training
The models were fine-tuned from the following base checkpoints:
- Classifier: microsoft/deberta-v3-large with LoRA (r=16, alpha=32)
- Extractor: fastino/gliner2-large-v1 with LoRA extraction adapter
Training used the GLiNER2 multi-task schema, combining binary classification (is_transaction) with structured extraction (transaction_info) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs.
Metrics
| Metric | Value |
|---|---|
| Classification accuracy | 0.80 |
| Amount extraction accuracy | 1.00 |
| Type extraction accuracy | 1.00 |
| Digits extraction accuracy | 1.00 |
| Avg latency (FP16, CPU) | 47 ms |
Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session.
Limitations
- Regional focus: Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated.
- English only: The model supports English language messages only.
- Span extraction, not generation: Field values must exist verbatim in the input text. The model extracts spans rather than generating new text.
- Synthetic evaluation data: The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ.
Use Cases
- Personal finance apps
- Expense tracking and categorization
- Transaction monitoring and alerting
- Bank statement reconciliation from SMS/notifications
License
This model is released under the CC-BY-4.0 license.
Links
- GitHub: https://github.com/sowrabhmv/fintext-extractor
- Notebooks: See the GitHub repo for cookbook examples and training notebooks