DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification
Fine-tuned LiLT model for document field extraction from receipts and invoices.
Performance (Holdout Set)
| Metric | Score |
|---|---|
| Macro F1 | 72.2% |
| Token Accuracy | 77.0% |
| Table F1 | 89.2% |
| Row Boundary F1 | 97.2% |
| Header F1 | 94.9% |
Training
- Epochs: 20
- Batch Size: 24
- Learning Rate: 2e-5
- Training Data: ~3000 documents
License
Apache 2.0