Accounting NER: PAYER / PAYEE / AMOUNT

A fine-tuned BERT model for extracting payer, payee, and amount entities from transaction text. Designed for accounting reconciliation and netting tasks where an agent must parse transaction histories and compute final settlements between parties.

Entity Types

Label	Description	Example
`PAYER`	The party sending/owing money	"Alice paid $500 to Bob"
`PAYEE`	The party receiving money	"Alice paid $500 to Bob"
`AMOUNT`	Monetary amounts	"Alice paid $500 to Bob"

Performance

Evaluated on a held-out validation set (2,385 examples):

Entity	Precision	Recall	F1
AMOUNT	0.96	0.98	0.97
PAYEE	0.89	0.91	0.90
PAYER	0.88	0.91	0.89
Overall	0.89	0.92	0.90

Usage

Python (Transformers)

from transformers import pipeline

ner = pipeline("ner", model="Minns-ai/accounting-ner", aggregation_strategy="simple")
results = ner("Alice paid $500 to Bob for dinner.")

ONNX Runtime

The onnx/ directory contains model.onnx and tokenizer.json for deployment with ONNX Runtime (e.g. in a Rust or C++ service).

import onnxruntime as ort
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("onnx/tokenizer.json")
session = ort.InferenceSession("onnx/model.onnx")

encoding = tokenizer.encode("Sam supplied $1,200 for Grace.")
outputs = session.run(None, {
    "input_ids": [encoding.ids],
    "attention_mask": [encoding.attention_mask],
    "token_type_ids": [encoding.type_ids],
})

Example Output

{
  "model": "bert-base-NER-onnx",
  "entities": [
    {"label": "PAYER", "start_offset": 0, "end_offset": 4, "confidence": 0.9996, "text": "anna"},
    {"label": "PAYEE", "start_offset": 11, "end_offset": 15, "confidence": 0.9996, "text": "john"},
    {"label": "PAYER", "start_offset": 35, "end_offset": 39, "confidence": 0.9991, "text": "tine"},
    {"label": "PAYEE", "start_offset": 45, "end_offset": 49, "confidence": 0.9996, "text": "john"},
    {"label": "PAYEE", "start_offset": 54, "end_offset": 58, "confidence": 0.9996, "text": "anna"}
  ]
}

Input: "anna payed john for the cinema but tine owes john and anna for covering her 20"

Training

Base Model

bert-base-uncased fine-tuned for token classification with 7 labels (BIO format): O, B-PAYER, I-PAYER, B-PAYEE, I-PAYEE, B-AMOUNT, I-AMOUNT

Training Data (~10K examples from three sources)

1. expertai/BUSTER (9,861 examples) Business transaction documents from SEC EDGAR filings. Entity types remapped:

Parties.BUYING_COMPANY -> PAYER
Parties.SELLING_COMPANY -> PAYEE
Generic_Info.ANNUAL_REVENUES -> AMOUNT

Licensed under Apache 2.0.

2. Kaggle Invoice NER (64 examples) Invoice documents with extracted fields (TOTAL_AMOUNT, DUE_AMOUNT, ACCOUNT_NAME) converted to token-level BIO annotations.

3. Synthetic Data (2,400 examples) Programmatically generated transaction sentences to cover patterns underrepresented in the real datasets:

Formal ledger entries: "Sam supplied $1,200 for Grace."
Informal/casual language: "Leo payed Lucy 500 for cleaning."
Misspellings: "payed" instead of "paid"
Compound payers/payees: "Tom and Lucy paid Mike $200."
Missing amounts: "Alice covered Bob for dinner."
Multi-transaction sentences with conjunctions: "Anna paid John $50 but Tine owes John and Anna for covering her 20."
Transaction histories (3-8 concatenated transactions)

The synthetic data generator (training/data/create_dataset.py) uses 30+ templates, 60+ party names, and 40+ transaction reasons to produce diverse examples.

Hyperparameters

Parameter	Value
Learning rate	3e-5
Batch size	16
Epochs	5
Warmup ratio	0.1
Weight decay	0.01
Max sequence length	128

Intended Use

Extracting structured (payer, payee, amount) triples from:

Transaction histories for netting and settlement computation (canceling circular debts)
Accounting statements and ledger entries
Informal payment descriptions
Multi-party transactions

This supports tasks where an agent observes a history of transactions (e.g. "A supplied $X for B") between multiple parties and must compute the final settlement after netting.

Limitations

Trained primarily on English text
Best on short transaction sentences; long documents may need chunking (max 128 tokens)
Bare numbers without currency context (e.g. "20" at end of sentence) may not always be tagged as AMOUNT
Does not distinguish between different currencies in the same text
PAYER/PAYEE distinction relies on contextual cues (verbs like "paid", "owes", "received") — ambiguous sentences may be misclassified

Citation

If you use this model, please cite the BUSTER dataset which contributed the majority of training data:

@inproceedings{zugarini-etal-2023-buster,
    title = "{BUSTER}: a {``}{BUS}iness Transaction Entity Recognition{''} dataset",
    author = "Zugarini, Andrea and Zamai, Andrew and Ernandes, Marco and Rigutini, Leonardo",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    year = "2023",
    pages = "605--611",
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Minns-ai/accounting-ner

Base model

google-bert/bert-base-uncased

Quantized

(26)

this model

Minns-ai
/

accounting-ner