--- license: apache-2.0 language: - en - multilingual tags: - document-understanding - token-classification - layout - lilt - receipts - invoices datasets: - bluecopa/dextr-training-data-v3 --- # DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification Fine-tuned LiLT model for document field extraction from receipts and invoices. ## Performance (Holdout Set) | Metric | Score | |--------|-------| | Macro F1 | 72.2% | | Token Accuracy | 77.0% | | Table F1 | 89.2% | | Row Boundary F1 | 97.2% | | Header F1 | 94.9% | ## Training - Epochs: 20 - Batch Size: 24 - Learning Rate: 2e-5 - Training Data: ~3000 documents ## License Apache 2.0