TomasFAV commited on
Commit
d8a8ff8
·
verified ·
1 Parent(s): 82fa3e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -17
README.md CHANGED
@@ -4,34 +4,100 @@ license: apache-2.0
4
  base_model: TomasFAV/Pix2StructCzechInvoice
5
  tags:
6
  - generated_from_trainer
 
 
 
 
 
 
 
 
7
  metrics:
8
  - f1
9
  model-index:
10
- - name: Pix2StructCzechInvoiceV1
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # Pix2StructCzechInvoiceV1
18
 
19
- This model is a fine-tuned version of [TomasFAV/Pix2StructCzechInvoice](https://huggingface.co/TomasFAV/Pix2StructCzechInvoice) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.4679
22
- - F1: 0.6432
 
 
23
 
24
  ## Model description
25
 
26
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
 
 
 
 
 
 
 
31
 
32
- ## Training and evaluation data
 
 
 
 
 
 
 
33
 
34
- More information needed
 
 
 
 
 
 
 
 
35
 
36
  ## Training procedure
37
 
@@ -48,6 +114,8 @@ The following hyperparameters were used during training:
48
  - num_epochs: 10
49
  - mixed_precision_training: Native AMP
50
 
 
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | F1 |
@@ -63,10 +131,11 @@ The following hyperparameters were used during training:
63
  | 0.0393 | 9.0 | 675 | 0.4679 | 0.6432 |
64
  | 0.0392 | 10.0 | 750 | 0.5330 | 0.4931 |
65
 
 
66
 
67
- ### Framework versions
68
 
69
- - Transformers 5.0.0
70
- - Pytorch 2.10.0+cu128
71
- - Datasets 4.0.0
72
- - Tokenizers 0.22.2
 
4
  base_model: TomasFAV/Pix2StructCzechInvoice
5
  tags:
6
  - generated_from_trainer
7
+ - invoice-processing
8
+ - information-extraction
9
+ - czech-language
10
+ - document-ai
11
+ - multimodal-model
12
+ - generative-model
13
+ - synthetic-data
14
+ - layout-augmentation
15
  metrics:
16
  - f1
17
  model-index:
18
+ - name: Pix2StructCzechInvoice-V1
19
  results: []
20
  ---
21
 
22
+ # Pix2StructCzechInvoice (V1 Synthetic + Random Layout)
 
23
 
24
+ This model is a fine-tuned version of [TomasFAV/Pix2StructCzechInvoice](https://huggingface.co/TomasFAV/Pix2StructCzechInvoice) for structured information extraction from Czech invoices.
25
 
 
26
  It achieves the following results on the evaluation set:
27
+ - Loss: 0.4679
28
+ - F1: 0.6432
29
+
30
+ ---
31
 
32
  ## Model description
33
 
34
+ Pix2StructCzechInvoice (V1) extends the baseline generative model by introducing layout variability into the training data.
35
+
36
+ Unlike token classification models, this model:
37
+ - processes full document images
38
+ - generates structured outputs as text sequences
39
+
40
+ It is trained to extract key invoice fields:
41
+ - supplier
42
+ - customer
43
+ - invoice number
44
+ - bank details
45
+ - totals
46
+ - dates
47
+
48
+ ---
49
+
50
+ ## Training data
51
+
52
+ The dataset consists of:
53
+
54
+ - synthetically generated invoice images
55
+ - augmented variants with randomized layouts
56
+ - corresponding structured text outputs
57
+
58
+ Key properties:
59
+ - variable layout structure
60
+ - visual diversity (spacing, positioning, formatting)
61
+ - consistent annotation format
62
+ - fully synthetic data
63
+
64
+ This introduces **layout variability in the visual domain**, which is crucial for generative multimodal models.
65
+
66
+ ---
67
+
68
+ ## Role in the pipeline
69
+
70
+ This model corresponds to:
71
 
72
+ **V1 Synthetic templates + randomized layouts**
73
 
74
+ It is used to:
75
+ - evaluate the effect of layout variability on generative models
76
+ - compare against:
77
+ - V0 (fixed templates)
78
+ - later hybrid and real-data stages (V2, V3)
79
+ - analyze robustness of end-to-end extraction
80
+
81
+ ---
82
 
83
+ ## Intended uses
84
+
85
+ - End-to-end invoice extraction from images
86
+ - Document VQA-style tasks
87
+ - Research in generative document understanding
88
+ - Comparison with structured prediction models
89
+
90
+ ---
91
 
92
+ ## Limitations
93
+
94
+ - Still trained only on synthetic data
95
+ - Sensitive to output formatting inconsistencies
96
+ - Training instability (fluctuating F1 across epochs)
97
+ - Evaluation depends on string matching quality
98
+ - Less interpretable than token classification models
99
+
100
+ ---
101
 
102
  ## Training procedure
103
 
 
114
  - num_epochs: 10
115
  - mixed_precision_training: Native AMP
116
 
117
+ ---
118
+
119
  ### Training results
120
 
121
  | Training Loss | Epoch | Step | Validation Loss | F1 |
 
131
  | 0.0393 | 9.0 | 675 | 0.4679 | 0.6432 |
132
  | 0.0392 | 10.0 | 750 | 0.5330 | 0.4931 |
133
 
134
+ ---
135
 
136
+ ## Framework versions
137
 
138
+ - Transformers 5.0.0
139
+ - PyTorch 2.10.0+cu128
140
+ - Datasets 4.0.0
141
+ - Tokenizers 0.22.2