File size: 6,016 Bytes

b7f5c51

---

license: cc-by-4.0
language:
  - en
tags:
  - onnx
  - ner
  - transaction-extraction
  - sms-parsing
  - gliner2
  - deberta
  - on-device
  - mobile
library_name: onnxruntime
pipeline_tag: token-classification
---


# Model Card: fintext-extractor

GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend.

## Architecture

fintext-extractor uses a **two-stage pipeline** to maximize both speed and accuracy:

1. **Stage 1 -- Classification:** A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (`is_transaction: yes/no`). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low.

2. **Stage 2 -- Extraction:** A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits.

This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams.

## Extracted Fields

| Field | Type | Description |
|-------|------|-------------|
| `is_transaction` | bool | Whether the message is a completed transaction |
| `transaction_amount` | float | Numeric amount (e.g., 5000.00) |
| `transaction_type` | str | DEBIT or CREDIT |
| `transaction_date` | str | Date in DD-MM-YYYY format |
| `transaction_description` | str | Merchant or person name |
| `masked_account_digits` | str | Last 4 digits of card/account |

## Model Files

| File | Size | Description |
|------|------|-------------|
| `onnx/deberta_classifier_fp16.onnx` + `.data` | ~830 MB | Classification model (FP16) |
| `onnx/deberta_classifier_fp32.onnx` + `.data` | ~1.66 GB | Classification model (FP32) |
| `onnx/extraction_full_fp16.onnx` + `.data` | ~930 MB | Extraction model (FP16) |
| `onnx/extraction_full_fp32.onnx` + `.data` | ~1.9 GB | Extraction model (FP32) |
| `tokenizer/` | ~11 MB | Classification tokenizer |
| `tokenizer_extraction/` | ~11 MB | Extraction tokenizer |

FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision.

## Quick Start (Python)

```python

from fintext import FintextExtractor



extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor")

result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26")

print(result)

# {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT',

#  'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay',

#  'masked_account_digits': '1234'}

```

## Direct ONNX Runtime Usage

If you prefer not to install the `fintext` library, you can run the ONNX models directly:

```python

import numpy as np

import onnxruntime as ort

from tokenizers import Tokenizer



# Load classification model and tokenizer

cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx")

tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json")



# Tokenize input

text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26"

encoding = tokenizer.encode(text)

input_ids = np.array([encoding.ids], dtype=np.int64)

attention_mask = np.array([encoding.attention_mask], dtype=np.int64)



# Run classification

cls_output = cls_session.run(None, {

    "input_ids": input_ids,

    "attention_mask": attention_mask,

})

is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1



# If classified as a transaction, run extraction

if is_transaction:

    ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx")

    ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json")

    # ... tokenize and run extraction session

```

## Training

The models were fine-tuned from the following base checkpoints:

- **Classifier:** [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) with LoRA (r=16, alpha=32)
- **Extractor:** [fastino/gliner2-large-v1](https://huggingface.co/fastino/gliner2-large-v1) with LoRA extraction adapter

Training used the GLiNER2 multi-task schema, combining binary classification (`is_transaction`) with structured extraction (`transaction_info`) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs.

## Metrics

| Metric | Value |
|--------|-------|
| Classification accuracy | 0.80 |
| Amount extraction accuracy | 1.00 |
| Type extraction accuracy | 1.00 |
| Digits extraction accuracy | 1.00 |
| Avg latency (FP16, CPU) | 47 ms |

Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session.

## Limitations

- **Regional focus:** Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated.
- **English only:** The model supports English language messages only.
- **Span extraction, not generation:** Field values must exist verbatim in the input text. The model extracts spans rather than generating new text.
- **Synthetic evaluation data:** The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ.

## Use Cases

- Personal finance apps
- Expense tracking and categorization
- Transaction monitoring and alerting
- Bank statement reconciliation from SMS/notifications

## License

This model is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.

## Links

- **GitHub:** [https://github.com/sowrabhmv/fintext-extractor](https://github.com/sowrabhmv/fintext-extractor)
- **Notebooks:** See the GitHub repo for cookbook examples and training notebooks