SentIDP Invoice NER β v1.0
Fine-tuned LayoutLMv3 for Ghanaian Invoice Information Extraction
Model Description
SentIDP Invoice NER is a named entity recognition model for automated extraction of key fields from Ghanaian business invoices. It is fine-tuned on top of albertosei/invoice-ner-v2-amaye15, which was originally trained on 13,000 invoice images using docTR OCR.
This model was developed by the TYNCAD team as part of an Intelligent Document Processing pipeline tailored specifically to the Ghanaian business context β including GRA tax rules, local company naming conventions, Ghanaian banks, Mobile Money (MoMo), and GHS currency formats.
Training Data
- 20,000 synthetic Ghanaian invoices
- 15 industries covered: logistics, pharmacy, real estate, healthcare, insurance, branding, banking, fintech, automobile, marketing, agencies, NGO, government, technology, financial
- 8 currencies with GHS (Ghana Cedis) at 62% weight
- 10 PDF layout engines for visual diversity
- Ground truth generated automatically via PyMuPDF β exact word-level bounding boxes, zero manual labelling
- Augmented with scan noise, rotation, perspective distortion, and JPEG compression artifacts to simulate real-world scans
Training Configuration
| Parameter | Value |
|---|---|
| Base model | albertosei/invoice-ner-v2-amaye15 |
| Epochs completed | 2 of 3 |
| Train samples | 18,000 |
| Validation samples | 2,000 |
| Batch size | 2 (gradient accumulation: 2) |
| Learning rate | 5e-5 |
| Weight decay | 0.01 |
| Max sequence length | 512 |
| Precision | fp16 |
| Hardware | 2Γ NVIDIA Tesla T4 (Kaggle) |
Evaluation Results
Evaluated on 2,000 held-out validation invoices after each epoch:
| Epoch | Training Loss | Validation Loss | F1 | Precision | Recall |
|---|---|---|---|---|---|
| 1 | 0.0361 | 0.0170 | 0.9974 | 0.9972 | 0.9975 |
| 2 | 0.0197 | 0.0095 | 0.9988 | 0.9991 | 0.9985 |
Labels (50 total)
The model extracts the following fields using BIO tagging:
| Category | Labels |
|---|---|
| Document header | HEADER_TYPE, HEADER_NUMBER, HEADER_DATE, DUE_DATE, PO_NUMBER |
| Seller | SELLER_NAME, SELLER_ADDRESS, SELLER_TIN |
| Buyer | BUYER_NAME, BUYER_ADDRESS |
| Line items | ITEM_DESC, QTY, UNIT_PRICE, LINE_TOTAL |
| Financials | SUBTOTAL, TAX_AMOUNT, GRAND_TOTAL, AMOUNT_WORDS |
| Payment | PAYMENT_TERMS, BANK_NAME, ACCOUNT_NUMBER, MOMO_DETAILS |
| Signatories | PREPARED_BY, AUTHORISED_BY |
| Other | O, B-OTHER |
Intended Use
- Automated invoice processing for Ghanaian businesses
- GRA tax compliance validation
- Accounts payable automation
- Invoice data extraction for ERP integration
Limitations
- Trained entirely on synthetic data β v1.0
- Performance on heavily degraded real scans may vary
- Real Ghanaian invoice fine-tuning planned for v2.0
- Currently supports single-page invoices only
Roadmap
- v1.1 β Epoch 3 training completion
- v2.0 β Fine-tuning on real annotated Ghanaian invoices
- v2.1 β Multi-page invoice support
- v3.0 β Full IDP pipeline with structured JSON output
Developed By
TYNCAD
Citation
If you use this model, please cite:
@misc{sentidp-invoice-ner-2025,
title = {SentIDP Invoice NER β Ghanaian Invoice Information Extraction},
author = {TYNCAD},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/albertosei/sentidp-ner_20k}
}
- Downloads last month
- 82
Evaluation results
- f1self-reported0.999
- precisionself-reported0.999
- recallself-reported0.999