File size: 4,188 Bytes
614fd55
 
 
 
 
 
7c50501
 
 
 
 
 
 
614fd55
 
 
 
 
 
7c50501
614fd55
 
 
7c50501
614fd55
7c50501
614fd55
 
7c50501
 
 
 
 
 
 
614fd55
 
 
7c50501
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614fd55
7c50501
614fd55
7c50501
614fd55
7c50501
614fd55
7c50501
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614fd55
 
 
 
 
 
1186bda
 
fb75869
614fd55
 
 
e52cfdc
 
614fd55
 
7c50501
 
614fd55
 
 
 
1186bda
 
 
 
 
 
 
 
 
 
614fd55
7c50501
614fd55
7c50501
614fd55
7c50501
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
library_name: transformers
license: cc-by-nc-sa-4.0
base_model: microsoft/layoutlmv3-base
tags:
- generated_from_trainer
- invoice-processing
- information-extraction
- czech-language
- document-ai
- layout-aware-model
- multimodal-model
- synthetic-data
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: LayoutLMv3InvoiceCzech-V0
  results: []
---

# LayoutLMv3InvoiceCzech (V0 – Synthetic Templates Only)

This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:
- Loss: 0.2146  
- Precision: 0.5354  
- Recall: 0.7428  
- F1: 0.6223  
- Accuracy: 0.9583  

---

## Model description

LayoutLMv3InvoiceCzech (V0) is a multimodal document understanding model that leverages:

- textual information  
- spatial layout (bounding boxes)  
- visual features (image embeddings)  

The model performs token-level classification to extract structured invoice fields:
- supplier  
- customer  
- invoice number  
- bank details  
- totals  
- dates  

This version is trained exclusively on synthetically generated invoice templates.

---

## Training data

The dataset consists of:

- synthetically generated invoices  
- fixed template layouts  
- corresponding bounding boxes  
- rendered document images  

Key properties:
- consistent structure across samples  
- clean and noise-free data  
- perfect alignment between text, layout, and image  
- no real-world documents  

This represents the **baseline dataset** for multimodal document models.

---

## Role in the pipeline

This model corresponds to:

**V0 – Synthetic template-based dataset only**

It is used to:
- establish a baseline for multimodal models  
- compare against:
  - text-only models (BERT)  
  - layout-aware models without vision (LiLT)  
- evaluate the contribution of visual features in a controlled setting  

---

## Intended uses

- Research in multimodal document understanding  
- Benchmarking LayoutLMv3 on structured documents  
- Comparison with other architectures (BERT, LiLT, etc.)  
- Czech invoice information extraction  

---

## Limitations

- Trained only on synthetic data with fixed layouts  
- Limited generalization to real-world invoices  
- Visual features are learned from clean synthetic renderings  
- No exposure to:
  - OCR errors  
  - scanning artifacts  
  - real-world noise  

---

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP

---

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log        | 1.0   | 150  | 0.2817          | 0.1429    | 0.0829 | 0.1049 | 0.9470   |
| No log        | 2.0   | 300  | 0.2222          | 0.3480    | 0.4822 | 0.4043 | 0.9480   |
| No log        | 3.0   | 450  | 0.2170          | 0.3852    | 0.5736 | 0.4609 | 0.9480   |
| 0.5287        | 4.0   | 600  | 0.1919          | 0.4625    | 0.6261 | 0.5320 | 0.9558   |
| 0.5287        | 5.0   | 750  | 0.1701          | 0.5254    | 0.7174 | 0.6066 | 0.9627   |
| 0.5287        | 6.0   | 900  | 0.2060          | 0.5173    | 0.7327 | 0.6064 | 0.9565   |
| 0.0360        | 7.0   | 1050 | 0.2161          | 0.5370    | 0.7124 | 0.6124 | 0.9594   |
| 0.0360        | 8.0   | 1200 | 0.2146          | 0.5359    | 0.7445 | 0.6232 | 0.9584   |
| 0.0360        | 9.0   | 1350 | 0.2141          | 0.5268    | 0.7327 | 0.6129 | 0.9578   |
| 0.0147        | 10.0  | 1500 | 0.2131          | 0.5393    | 0.7310 | 0.6207 | 0.9597   |

---

## Framework versions

- Transformers 5.0.0  
- PyTorch 2.10.0+cu128  
- Datasets 4.0.0  
- Tokenizers 0.22.2