SentIDP Invoice NER β€” v1.0

Fine-tuned LayoutLMv3 for Ghanaian Invoice Information Extraction


Model Description

SentIDP Invoice NER is a named entity recognition model for automated extraction of key fields from Ghanaian business invoices. It is fine-tuned on top of albertosei/invoice-ner-v2-amaye15, which was originally trained on 13,000 invoice images using docTR OCR.

This model was developed by the TYNCAD team as part of an Intelligent Document Processing pipeline tailored specifically to the Ghanaian business context β€” including GRA tax rules, local company naming conventions, Ghanaian banks, Mobile Money (MoMo), and GHS currency formats.


Training Data

  • 20,000 synthetic Ghanaian invoices
  • 15 industries covered: logistics, pharmacy, real estate, healthcare, insurance, branding, banking, fintech, automobile, marketing, agencies, NGO, government, technology, financial
  • 8 currencies with GHS (Ghana Cedis) at 62% weight
  • 10 PDF layout engines for visual diversity
  • Ground truth generated automatically via PyMuPDF β€” exact word-level bounding boxes, zero manual labelling
  • Augmented with scan noise, rotation, perspective distortion, and JPEG compression artifacts to simulate real-world scans

Training Configuration

Parameter Value
Base model albertosei/invoice-ner-v2-amaye15
Epochs completed 2 of 3
Train samples 18,000
Validation samples 2,000
Batch size 2 (gradient accumulation: 2)
Learning rate 5e-5
Weight decay 0.01
Max sequence length 512
Precision fp16
Hardware 2Γ— NVIDIA Tesla T4 (Kaggle)

Evaluation Results

Evaluated on 2,000 held-out validation invoices after each epoch:

Epoch Training Loss Validation Loss F1 Precision Recall
1 0.0361 0.0170 0.9974 0.9972 0.9975
2 0.0197 0.0095 0.9988 0.9991 0.9985

Labels (50 total)

The model extracts the following fields using BIO tagging:

Category Labels
Document header HEADER_TYPE, HEADER_NUMBER, HEADER_DATE, DUE_DATE, PO_NUMBER
Seller SELLER_NAME, SELLER_ADDRESS, SELLER_TIN
Buyer BUYER_NAME, BUYER_ADDRESS
Line items ITEM_DESC, QTY, UNIT_PRICE, LINE_TOTAL
Financials SUBTOTAL, TAX_AMOUNT, GRAND_TOTAL, AMOUNT_WORDS
Payment PAYMENT_TERMS, BANK_NAME, ACCOUNT_NUMBER, MOMO_DETAILS
Signatories PREPARED_BY, AUTHORISED_BY
Other O, B-OTHER

Intended Use

  • Automated invoice processing for Ghanaian businesses
  • GRA tax compliance validation
  • Accounts payable automation
  • Invoice data extraction for ERP integration

Limitations

  • Trained entirely on synthetic data β€” v1.0
  • Performance on heavily degraded real scans may vary
  • Real Ghanaian invoice fine-tuning planned for v2.0
  • Currently supports single-page invoices only

Roadmap

  • v1.1 β€” Epoch 3 training completion
  • v2.0 β€” Fine-tuning on real annotated Ghanaian invoices
  • v2.1 β€” Multi-page invoice support
  • v3.0 β€” Full IDP pipeline with structured JSON output

Developed By

TYNCAD


Citation

If you use this model, please cite:

@misc{sentidp-invoice-ner-2025,
  title     = {SentIDP Invoice NER β€” Ghanaian Invoice Information Extraction},
  author    = {TYNCAD},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/albertosei/sentidp-ner_20k}
}
Downloads last month
82
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results