File size: 667 Bytes
073f8f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: apache-2.0
language:
- en
- multilingual
tags:
- document-understanding
- token-classification
- layout
- lilt
- receipts
- invoices
datasets:
- bluecopa/dextr-training-data-v3
---

# DEXTR-LiLT: Document Extraction with Query-Conditioned Token Classification

Fine-tuned LiLT model for document field extraction from receipts and invoices.

## Performance (Holdout Set)

| Metric | Score |
|--------|-------|
| Macro F1 | 72.2% |
| Token Accuracy | 77.0% |
| Table F1 | 89.2% |
| Row Boundary F1 | 97.2% |
| Header F1 | 94.9% |

## Training

- Epochs: 20
- Batch Size: 24  
- Learning Rate: 2e-5
- Training Data: ~3000 documents

## License

Apache 2.0