Model Card: fintext-extractor

GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend.

Architecture

fintext-extractor uses a two-stage pipeline to maximize both speed and accuracy:

  1. Stage 1 -- Classification: A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (is_transaction: yes/no). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low.

  2. Stage 2 -- Extraction: A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits.

This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams.

Extracted Fields

Field Type Description
is_transaction bool Whether the message is a completed transaction
transaction_amount float Numeric amount (e.g., 5000.00)
transaction_type str DEBIT or CREDIT
transaction_date str Date in DD-MM-YYYY format
transaction_description str Merchant or person name
masked_account_digits str Last 4 digits of card/account

Model Files

File Size Description
onnx/deberta_classifier_fp16.onnx + .data ~830 MB Classification model (FP16)
onnx/deberta_classifier_fp32.onnx + .data ~1.66 GB Classification model (FP32)
onnx/extraction_full_fp16.onnx + .data ~930 MB Extraction model (FP16)
onnx/extraction_full_fp32.onnx + .data ~1.9 GB Extraction model (FP32)
tokenizer/ ~11 MB Classification tokenizer
tokenizer_extraction/ ~11 MB Extraction tokenizer

FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision.

Quick Start (Python)

from fintext import FintextExtractor

extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor")
result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26")
print(result)
# {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT',
#  'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay',
#  'masked_account_digits': '1234'}

Direct ONNX Runtime Usage

If you prefer not to install the fintext library, you can run the ONNX models directly:

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

# Load classification model and tokenizer
cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx")
tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json")

# Tokenize input
text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26"
encoding = tokenizer.encode(text)
input_ids = np.array([encoding.ids], dtype=np.int64)
attention_mask = np.array([encoding.attention_mask], dtype=np.int64)

# Run classification
cls_output = cls_session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask,
})
is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1

# If classified as a transaction, run extraction
if is_transaction:
    ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx")
    ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json")
    # ... tokenize and run extraction session

Training

The models were fine-tuned from the following base checkpoints:

Training used the GLiNER2 multi-task schema, combining binary classification (is_transaction) with structured extraction (transaction_info) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs.

Metrics

Metric Value
Classification accuracy 0.80
Amount extraction accuracy 1.00
Type extraction accuracy 1.00
Digits extraction accuracy 1.00
Avg latency (FP16, CPU) 47 ms

Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session.

Limitations

  • Regional focus: Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated.
  • English only: The model supports English language messages only.
  • Span extraction, not generation: Field values must exist verbatim in the input text. The model extracts spans rather than generating new text.
  • Synthetic evaluation data: The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ.

Use Cases

  • Personal finance apps
  • Expense tracking and categorization
  • Transaction monitoring and alerting
  • Bank statement reconciliation from SMS/notifications

License

This model is released under the CC-BY-4.0 license.

Links

Downloads last month
90
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Sowrabhm/fintext-extractor 1