--- license: cc-by-4.0 language: - en tags: - onnx - ner - transaction-extraction - sms-parsing - gliner2 - deberta - on-device - mobile library_name: onnxruntime pipeline_tag: token-classification --- # Model Card: fintext-extractor GLiNER2-based two-stage NER model that extracts structured transaction data from bank SMS and push notifications. Designed for on-device inference on mobile and desktop, with ONNX Runtime as the inference backend. ## Architecture fintext-extractor uses a **two-stage pipeline** to maximize both speed and accuracy: 1. **Stage 1 -- Classification:** A DeBERTa-v3-large binary classifier determines whether an incoming message is a completed transaction (`is_transaction: yes/no`). Non-transaction messages (OTPs, promotional alerts, balance reminders) are filtered out early, keeping latency low. 2. **Stage 2 -- Extraction:** A GLiNER2-large extraction model with a LoRA adapter runs only on messages classified as transactions. It extracts structured fields: amount, date, transaction type, description, and masked account digits. This two-stage design means the heavier extraction model is invoked only when needed, reducing average inference cost on mixed message streams. ## Extracted Fields | Field | Type | Description | |-------|------|-------------| | `is_transaction` | bool | Whether the message is a completed transaction | | `transaction_amount` | float | Numeric amount (e.g., 5000.00) | | `transaction_type` | str | DEBIT or CREDIT | | `transaction_date` | str | Date in DD-MM-YYYY format | | `transaction_description` | str | Merchant or person name | | `masked_account_digits` | str | Last 4 digits of card/account | ## Model Files | File | Size | Description | |------|------|-------------| | `onnx/deberta_classifier_fp16.onnx` + `.data` | ~830 MB | Classification model (FP16) | | `onnx/deberta_classifier_fp32.onnx` + `.data` | ~1.66 GB | Classification model (FP32) | | `onnx/extraction_full_fp16.onnx` + `.data` | ~930 MB | Extraction model (FP16) | | `onnx/extraction_full_fp32.onnx` + `.data` | ~1.9 GB | Extraction model (FP32) | | `tokenizer/` | ~11 MB | Classification tokenizer | | `tokenizer_extraction/` | ~11 MB | Extraction tokenizer | FP16 variants are recommended for most use cases. FP32 variants are provided for environments that do not support half-precision. ## Quick Start (Python) ```python from fintext import FintextExtractor extractor = FintextExtractor.from_pretrained("Sowrabhm/fintext-extractor") result = extractor.extract("Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26") print(result) # {'is_transaction': True, 'transaction_amount': 5000.0, 'transaction_type': 'DEBIT', # 'transaction_date': '08-03-2026', 'transaction_description': 'Amazon Pay', # 'masked_account_digits': '1234'} ``` ## Direct ONNX Runtime Usage If you prefer not to install the `fintext` library, you can run the ONNX models directly: ```python import numpy as np import onnxruntime as ort from tokenizers import Tokenizer # Load classification model and tokenizer cls_session = ort.InferenceSession("onnx/deberta_classifier_fp16.onnx") tokenizer = Tokenizer.from_file("tokenizer/tokenizer.json") # Tokenize input text = "Rs.5,000 debited from a/c XX1234 for Amazon Pay on 08-Mar-26" encoding = tokenizer.encode(text) input_ids = np.array([encoding.ids], dtype=np.int64) attention_mask = np.array([encoding.attention_mask], dtype=np.int64) # Run classification cls_output = cls_session.run(None, { "input_ids": input_ids, "attention_mask": attention_mask, }) is_transaction = np.argmax(cls_output[0], axis=-1)[0] == 1 # If classified as a transaction, run extraction if is_transaction: ext_session = ort.InferenceSession("onnx/extraction_full_fp16.onnx") ext_tokenizer = Tokenizer.from_file("tokenizer_extraction/tokenizer.json") # ... tokenize and run extraction session ``` ## Training The models were fine-tuned from the following base checkpoints: - **Classifier:** [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) with LoRA (r=16, alpha=32) - **Extractor:** [fastino/gliner2-large-v1](https://huggingface.co/fastino/gliner2-large-v1) with LoRA extraction adapter Training used the GLiNER2 multi-task schema, combining binary classification (`is_transaction`) with structured extraction (`transaction_info`) in a single training loop. LoRA adapters keep the trainable parameter count low, enabling fine-tuning on consumer GPUs. ## Metrics | Metric | Value | |--------|-------| | Classification accuracy | 0.80 | | Amount extraction accuracy | 1.00 | | Type extraction accuracy | 1.00 | | Digits extraction accuracy | 1.00 | | Avg latency (FP16, CPU) | 47 ms | Metrics were evaluated on a held-out test split. Latency measured on a single-threaded ONNX Runtime CPU session. ## Limitations - **Regional focus:** Primarily trained on Indian bank SMS formats (Rs., INR, currency symbols common in India). Performance on other regional formats has not been evaluated. - **English only:** The model supports English language messages only. - **Span extraction, not generation:** Field values must exist verbatim in the input text. The model extracts spans rather than generating new text. - **Synthetic evaluation data:** The evaluation metrics above were computed on synthetic data. Real-world accuracy may differ. ## Use Cases - Personal finance apps - Expense tracking and categorization - Transaction monitoring and alerting - Bank statement reconciliation from SMS/notifications ## License This model is released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. ## Links - **GitHub:** [https://github.com/sowrabhmv/fintext-extractor](https://github.com/sowrabhmv/fintext-extractor) - **Notebooks:** See the GitHub repo for cookbook examples and training notebooks