dextr-lilt / README.md
satya007's picture
Upload README.md with huggingface_hub
073f8f9 verified
metadata
license: apache-2.0
language:
  - en
  - multilingual
tags:
  - document-understanding
  - token-classification
  - layout
  - lilt
  - receipts
  - invoices
datasets:
  - bluecopa/dextr-training-data-v3

DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification

Fine-tuned LiLT model for document field extraction from receipts and invoices.

Performance (Holdout Set)

Metric Score
Macro F1 72.2%
Token Accuracy 77.0%
Table F1 89.2%
Row Boundary F1 97.2%
Header F1 94.9%

Training

  • Epochs: 20
  • Batch Size: 24
  • Learning Rate: 2e-5
  • Training Data: ~3000 documents

License

Apache 2.0