DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification

Fine-tuned LiLT model for document field extraction from receipts and invoices.

Performance (Holdout Set)

Metric Score
Macro F1 72.2%
Token Accuracy 77.0%
Table F1 89.2%
Row Boundary F1 97.2%
Header F1 94.9%

Training

  • Epochs: 20
  • Batch Size: 24
  • Learning Rate: 2e-5
  • Training Data: ~3000 documents

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train bluecopa/dextr-lilt