TomasFAV commited on
Commit
7c50501
·
verified ·
1 Parent(s): 1186bda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -20
README.md CHANGED
@@ -4,40 +4,110 @@ license: cc-by-nc-sa-4.0
4
  base_model: microsoft/layoutlmv3-base
5
  tags:
6
  - generated_from_trainer
 
 
 
 
 
 
 
7
  metrics:
8
  - precision
9
  - recall
10
  - f1
11
  - accuracy
12
  model-index:
13
- - name: Layoutlmv3InvoiceCzech
14
  results: []
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
 
20
- # Layoutlmv3InvoiceCzech
21
 
22
- This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) on an unknown dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 0.2146
25
- - Precision: 0.5354
26
- - Recall: 0.7428
27
- - F1: 0.6223
28
- - Accuracy: 0.9583
 
 
29
 
30
  ## Model description
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
- ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Training procedure
43
 
@@ -54,6 +124,8 @@ The following hyperparameters were used during training:
54
  - num_epochs: 10
55
  - mixed_precision_training: Native AMP
56
 
 
 
57
  ### Training results
58
 
59
  | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
@@ -69,10 +141,11 @@ The following hyperparameters were used during training:
69
  | 0.0360 | 9.0 | 1350 | 0.2141 | 0.5268 | 0.7327 | 0.6129 | 0.9578 |
70
  | 0.0147 | 10.0 | 1500 | 0.2131 | 0.5393 | 0.7310 | 0.6207 | 0.9597 |
71
 
 
72
 
73
- ### Framework versions
74
 
75
- - Transformers 5.0.0
76
- - Pytorch 2.10.0+cu128
77
- - Datasets 4.0.0
78
- - Tokenizers 0.22.2
 
4
  base_model: microsoft/layoutlmv3-base
5
  tags:
6
  - generated_from_trainer
7
+ - invoice-processing
8
+ - information-extraction
9
+ - czech-language
10
+ - document-ai
11
+ - layout-aware-model
12
+ - multimodal-model
13
+ - synthetic-data
14
  metrics:
15
  - precision
16
  - recall
17
  - f1
18
  - accuracy
19
  model-index:
20
+ - name: LayoutLMv3InvoiceCzech-V0
21
  results: []
22
  ---
23
 
24
+ # LayoutLMv3InvoiceCzech (V0 Synthetic Templates Only)
 
25
 
26
+ This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) for structured information extraction from Czech invoices.
27
 
 
28
  It achieves the following results on the evaluation set:
29
+ - Loss: 0.2146
30
+ - Precision: 0.5354
31
+ - Recall: 0.7428
32
+ - F1: 0.6223
33
+ - Accuracy: 0.9583
34
+
35
+ ---
36
 
37
  ## Model description
38
 
39
+ LayoutLMv3InvoiceCzech (V0) is a multimodal document understanding model that leverages:
40
+
41
+ - textual information
42
+ - spatial layout (bounding boxes)
43
+ - visual features (image embeddings)
44
+
45
+ The model performs token-level classification to extract structured invoice fields:
46
+ - supplier
47
+ - customer
48
+ - invoice number
49
+ - bank details
50
+ - totals
51
+ - dates
52
+
53
+ This version is trained exclusively on synthetically generated invoice templates.
54
+
55
+ ---
56
+
57
+ ## Training data
58
+
59
+ The dataset consists of:
60
+
61
+ - synthetically generated invoices
62
+ - fixed template layouts
63
+ - corresponding bounding boxes
64
+ - rendered document images
65
+
66
+ Key properties:
67
+ - consistent structure across samples
68
+ - clean and noise-free data
69
+ - perfect alignment between text, layout, and image
70
+ - no real-world documents
71
+
72
+ This represents the **baseline dataset** for multimodal document models.
73
+
74
+ ---
75
 
76
+ ## Role in the pipeline
77
 
78
+ This model corresponds to:
79
 
80
+ **V0 Synthetic template-based dataset only**
81
 
82
+ It is used to:
83
+ - establish a baseline for multimodal models
84
+ - compare against:
85
+ - text-only models (BERT)
86
+ - layout-aware models without vision (LiLT)
87
+ - evaluate the contribution of visual features in a controlled setting
88
+
89
+ ---
90
+
91
+ ## Intended uses
92
+
93
+ - Research in multimodal document understanding
94
+ - Benchmarking LayoutLMv3 on structured documents
95
+ - Comparison with other architectures (BERT, LiLT, etc.)
96
+ - Czech invoice information extraction
97
+
98
+ ---
99
+
100
+ ## Limitations
101
+
102
+ - Trained only on synthetic data with fixed layouts
103
+ - Limited generalization to real-world invoices
104
+ - Visual features are learned from clean synthetic renderings
105
+ - No exposure to:
106
+ - OCR errors
107
+ - scanning artifacts
108
+ - real-world noise
109
+
110
+ ---
111
 
112
  ## Training procedure
113
 
 
124
  - num_epochs: 10
125
  - mixed_precision_training: Native AMP
126
 
127
+ ---
128
+
129
  ### Training results
130
 
131
  | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
 
141
  | 0.0360 | 9.0 | 1350 | 0.2141 | 0.5268 | 0.7327 | 0.6129 | 0.9578 |
142
  | 0.0147 | 10.0 | 1500 | 0.2131 | 0.5393 | 0.7310 | 0.6207 | 0.9597 |
143
 
144
+ ---
145
 
146
+ ## Framework versions
147
 
148
+ - Transformers 5.0.0
149
+ - PyTorch 2.10.0+cu128
150
+ - Datasets 4.0.0
151
+ - Tokenizers 0.22.2