TomasFAV
/

BERTInvoiceCzechV01

@@ -1,41 +1,102 @@
 ---
 library_name: transformers
 tags:
 - generated_from_trainer
 metrics:
 - precision
 - recall
 - f1
 - accuracy
 model-index:
-- name: BERTInvoiceCzechRV1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# BERTInvoiceCzechRV1
-This model was trained from scratch on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.2295
-- Precision: 0.6594
-- Recall: 0.7309
-- F1: 0.6933
-- Accuracy: 0.9534
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -52,6 +113,8 @@ The following hyperparameters were used during training:
 - num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
@@ -67,10 +130,11 @@ The following hyperparameters were used during training:
 | 0.0306        | 9.0   | 585  | 0.2853          | 0.6059    | 0.7208 | 0.6584 | 0.9455   |
 | 0.0306        | 10.0  | 650  | 0.2859          | 0.6054    | 0.7239 | 0.6594 | 0.9452   |
-### Framework versions
-- Transformers 5.0.0
-- Pytorch 2.10.0+cu128
-- Datasets 4.0.0
-- Tokenizers 0.22.2

 ---
 library_name: transformers
+license: apache-2.0
+base_model: google-bert/bert-base-multilingual-cased
 tags:
 - generated_from_trainer
+- invoice-processing
+- information-extraction
+- czech-language
+- synthetic-data
+- layout-augmentation
 metrics:
 - precision
 - recall
 - f1
 - accuracy
 model-index:
+- name: BERTInvoiceCzechR-V1
   results: []
 ---
+# BERTInvoiceCzechR (V1 – Synthetic + Random Layout)
+This model is a fine-tuned version of [google-bert/bert-base-multilingual-cased](https://huggingface.co/google-bert/bert-base-multilingual-cased) for the task of structured information extraction from Czech invoices.
 It achieves the following results on the evaluation set:
+- Loss: 0.2295
+- Precision: 0.6594
+- Recall: 0.7309
+- F1: 0.6933
+- Accuracy: 0.9534
+---
 ## Model description
+BERTInvoiceCzechR (V1) extends the baseline model (V0) by introducing layout variability into the training data.
+The model performs token-level classification to extract structured invoice fields such as:
+- supplier
+- customer
+- invoice number
+- bank details
+- totals
+- dates
+Compared to V0, this version is trained on synthetically generated invoices with **randomized layouts**, improving robustness to positional and structural variations.
+---
+## Training data
+The dataset consists of:
+- synthetically generated invoices based on templates
+- additional variants with randomized layout structures
+Key properties:
+- variable positioning of fields
+- layout perturbations (shifts, spacing, ordering)
+- preserved semantic correctness of labels
+- still fully synthetic (no real invoices)
+This dataset introduces **layout diversity**, which is critical for generalization in document understanding tasks.
+---
+## Role in the pipeline
+This model corresponds to:
+**V1 – Synthetic templates + randomized layouts**
+It is used to:
+- evaluate the impact of layout variability
+- compare against:
+  - V0 (fixed templates)
+  - later stages with real data (V2, V3)
+- measure improvements in generalization
+---
+## Intended uses
+- Research in layout-aware NLP without explicit layout models
+- Benchmarking robustness to structural variation
+- Intermediate baseline for synthetic data pipelines
+- Czech invoice information extraction
+---
+## Limitations
+- Still trained only on synthetic data
+- No exposure to real-world noise (OCR errors, distortions)
+- Layout variation is artificial and may not fully reflect real documents
+- Does not leverage explicit spatial features (pure BERT)
+---
 ## Training procedure
 - num_epochs: 10
 - mixed_precision_training: Native AMP
+---
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
 | 0.0306        | 9.0   | 585  | 0.2853          | 0.6059    | 0.7208 | 0.6584 | 0.9455   |
 | 0.0306        | 10.0  | 650  | 0.2859          | 0.6054    | 0.7239 | 0.6594 | 0.9452   |
+---
+## Framework versions
+- Transformers 5.0.0
+- PyTorch 2.10.0+cu128
+- Datasets 4.0.0
+- Tokenizers 0.22.2