| license: apache-2.0 | |
| language: | |
| - en | |
| - multilingual | |
| tags: | |
| - document-understanding | |
| - token-classification | |
| - layout | |
| - lilt | |
| - receipts | |
| - invoices | |
| datasets: | |
| - bluecopa/dextr-training-data-v3 | |
| # DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification | |
| Fine-tuned LiLT model for document field extraction from receipts and invoices. | |
| ## Performance (Holdout Set) | |
| | Metric | Score | | |
| |--------|-------| | |
| | Macro F1 | 72.2% | | |
| | Token Accuracy | 77.0% | | |
| | Table F1 | 89.2% | | |
| | Row Boundary F1 | 97.2% | | |
| | Header F1 | 94.9% | | |
| ## Training | |
| - Epochs: 20 | |
| - Batch Size: 24 | |
| - Learning Rate: 2e-5 | |
| - Training Data: ~3000 documents | |
| ## License | |
| Apache 2.0 | |