Model Card: fintext-extractor

GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend.

Architecture

fintext-extractor uses a two-stage pipeline to maximize both speed and accuracy:

Stage 1 -- Classification: A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (is_transaction: yes/no). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low.
Stage 2 -- Extraction: A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits.

This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams.

Extracted Fields

Field	Type	Description
`is_transaction`	bool	Whether the message is a completed transaction
`transaction_amount`	float	Numeric amount (e.g., 5000.00)
`transaction_type`	str	DEBIT or CREDIT
`transaction_date`	str	Date in DD-MM-YYYY format
`transaction_description`	str	Merchant or person name
`masked_account_digits`	str	Last 4 digits of card/account

Model Files

File	Size	Description
`onnx/deberta_classifier_fp16.onnx` + `.data`	~830 MB	Classification model (FP16)
`onnx/deberta_classifier_fp32.onnx` + `.data`	~1.66 GB	Classification model (FP32)
`onnx/extraction_full_fp16.onnx` + `.data`	~930 MB	Extraction model (FP16)
`onnx/extraction_full_fp32.onnx` + `.data`	~1.9 GB	Extraction model (FP32)
`tokenizer/`	~11 MB	Classification tokenizer
`tokenizer_extraction/`	~11 MB	Extraction tokenizer

FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision.

Quick Start (Python)

from fintext import FintextExtractor

extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor")
result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26")
print(result)
# {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT',
#  'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay',
#  'masked_account_digits': '1234'}

Direct ONNX Runtime Usage

If you prefer not to install the fintext library, you can run the ONNX models directly:

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

# Load classification model and tokenizer
cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx")
tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json")

# Tokenize input
text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26"
encoding = tokenizer.encode(text)
input_ids = np.array([encoding.ids], dtype=np.int64)
attention_mask = np.array([encoding.attention_mask], dtype=np.int64)

# Run classification
cls_output = cls_session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask,
})
is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1

# If classified as a transaction, run extraction
if is_transaction:
    ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx")
    ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json")
    # ... tokenize and run extraction session

Training

The models were fine-tuned from the following base checkpoints:

Classifier: microsoft/deberta-v3-large with LoRA (r=16, alpha=32)
Extractor: fastino/gliner2-large-v1 with LoRA extraction adapter

Training used the GLiNER2 multi-task schema, combining binary classification (is_transaction) with structured extraction (transaction_info) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs.

Metrics

Metric	Value
Classification accuracy	0.80
Amount extraction accuracy	1.00
Type extraction accuracy	1.00
Digits extraction accuracy	1.00
Avg latency (FP16, CPU)	47 ms

Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session.

Limitations

Regional focus: Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated.
English only: The model supports English language messages only.
Span extraction, not generation: Field values must exist verbatim in the input text. The model extracts spans rather than generating new text.
Synthetic evaluation data: The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ.

Sowrabhm
/

fintext-extractor

Model Card: fintext-extractor

Architecture

Extracted Fields

Model Files

Quick Start (Python)

Direct ONNX Runtime Usage

Training

Metrics

Limitations

Use Cases

License

Links

Space using Sowrabhm/fintext-extractor 1