Accounting NER: PAYER / PAYEE / AMOUNT

A fine-tuned BERT model for extracting payer, payee, and amount entities from transaction text. Designed for accounting reconciliation and netting tasks where an agent must parse transaction histories and compute final settlements between parties.

Entity Types

Label Description Example
PAYER The party sending/owing money "Alice paid $500 to Bob"
PAYEE The party receiving money "Alice paid $500 to Bob"
AMOUNT Monetary amounts "Alice paid $500 to Bob"

Performance

Evaluated on a held-out validation set (2,385 examples):

Entity Precision Recall F1
AMOUNT 0.96 0.98 0.97
PAYEE 0.89 0.91 0.90
PAYER 0.88 0.91 0.89
Overall 0.89 0.92 0.90

Usage

Python (Transformers)

from transformers import pipeline

ner = pipeline("ner", model="Minns-ai/accounting-ner", aggregation_strategy="simple")
results = ner("Alice paid $500 to Bob for dinner.")

ONNX Runtime

The onnx/ directory contains model.onnx and tokenizer.json for deployment with ONNX Runtime (e.g. in a Rust or C++ service).

import onnxruntime as ort
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("onnx/tokenizer.json")
session = ort.InferenceSession("onnx/model.onnx")

encoding = tokenizer.encode("Sam supplied $1,200 for Grace.")
outputs = session.run(None, {
    "input_ids": [encoding.ids],
    "attention_mask": [encoding.attention_mask],
    "token_type_ids": [encoding.type_ids],
})

Example Output

{
  "model": "bert-base-NER-onnx",
  "entities": [
    {"label": "PAYER", "start_offset": 0, "end_offset": 4, "confidence": 0.9996, "text": "anna"},
    {"label": "PAYEE", "start_offset": 11, "end_offset": 15, "confidence": 0.9996, "text": "john"},
    {"label": "PAYER", "start_offset": 35, "end_offset": 39, "confidence": 0.9991, "text": "tine"},
    {"label": "PAYEE", "start_offset": 45, "end_offset": 49, "confidence": 0.9996, "text": "john"},
    {"label": "PAYEE", "start_offset": 54, "end_offset": 58, "confidence": 0.9996, "text": "anna"}
  ]
}

Input: "anna payed john for the cinema but tine owes john and anna for covering her 20"

Training

Base Model

bert-base-uncased fine-tuned for token classification with 7 labels (BIO format): O, B-PAYER, I-PAYER, B-PAYEE, I-PAYEE, B-AMOUNT, I-AMOUNT

Training Data (~10K examples from three sources)

1. expertai/BUSTER (9,861 examples) Business transaction documents from SEC EDGAR filings. Entity types remapped:

  • Parties.BUYING_COMPANY -> PAYER
  • Parties.SELLING_COMPANY -> PAYEE
  • Generic_Info.ANNUAL_REVENUES -> AMOUNT

Licensed under Apache 2.0.

2. Kaggle Invoice NER (64 examples) Invoice documents with extracted fields (TOTAL_AMOUNT, DUE_AMOUNT, ACCOUNT_NAME) converted to token-level BIO annotations.

3. Synthetic Data (2,400 examples) Programmatically generated transaction sentences to cover patterns underrepresented in the real datasets:

  • Formal ledger entries: "Sam supplied $1,200 for Grace."
  • Informal/casual language: "Leo payed Lucy 500 for cleaning."
  • Misspellings: "payed" instead of "paid"
  • Compound payers/payees: "Tom and Lucy paid Mike $200."
  • Missing amounts: "Alice covered Bob for dinner."
  • Multi-transaction sentences with conjunctions: "Anna paid John $50 but Tine owes John and Anna for covering her 20."
  • Transaction histories (3-8 concatenated transactions)

The synthetic data generator (training/data/create_dataset.py) uses 30+ templates, 60+ party names, and 40+ transaction reasons to produce diverse examples.

Hyperparameters

Parameter Value
Learning rate 3e-5
Batch size 16
Epochs 5
Warmup ratio 0.1
Weight decay 0.01
Max sequence length 128

Intended Use

Extracting structured (payer, payee, amount) triples from:

  • Transaction histories for netting and settlement computation (canceling circular debts)
  • Accounting statements and ledger entries
  • Informal payment descriptions
  • Multi-party transactions

This supports tasks where an agent observes a history of transactions (e.g. "A supplied $X for B") between multiple parties and must compute the final settlement after netting.

Limitations

  • Trained primarily on English text
  • Best on short transaction sentences; long documents may need chunking (max 128 tokens)
  • Bare numbers without currency context (e.g. "20" at end of sentence) may not always be tagged as AMOUNT
  • Does not distinguish between different currencies in the same text
  • PAYER/PAYEE distinction relies on contextual cues (verbs like "paid", "owes", "received") — ambiguous sentences may be misclassified

Citation

If you use this model, please cite the BUSTER dataset which contributed the majority of training data:

@inproceedings{zugarini-etal-2023-buster,
    title = "{BUSTER}: a {``}{BUS}iness Transaction Entity Recognition{''} dataset",
    author = "Zugarini, Andrea and Zamai, Andrew and Ernandes, Marco and Rigutini, Leonardo",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    year = "2023",
    pages = "605--611",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Minns-ai/accounting-ner

Quantized
(22)
this model

Dataset used to train Minns-ai/accounting-ner