Update README.md

Browse files

Files changed (1) hide show

README.md +87 -20

README.md CHANGED Viewed

@@ -1,41 +1,105 @@
 ---
 library_name: transformers
 tags:
 - generated_from_trainer
 metrics:
 - precision
 - recall
 - f1
 - accuracy
 model-index:
-- name: LiLTInvoiceCzechV1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# LiLTInvoiceCzechV1
-This model was trained from scratch on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1907
-- Precision: 0.6326
-- Recall: 0.7491
-- F1: 0.6859
-- Accuracy: 0.9660
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -52,6 +116,8 @@ The following hyperparameters were used during training:
 - num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
@@ -67,10 +133,11 @@ The following hyperparameters were used during training:
 | No log        | 9.0   | 342  | 0.1939          | 0.6700    | 0.6962 | 0.6828 | 0.9701   |
 | No log        | 10.0  | 380  | 0.1931          | 0.6645    | 0.6928 | 0.6784 | 0.9696   |
-### Framework versions
-- Transformers 5.0.0
-- Pytorch 2.10.0+cu128
-- Datasets 4.0.0
-- Tokenizers 0.22.2

 ---
 library_name: transformers
+license: mit
+base_model: SCUT-DLVCLab/lilt-roberta-en-base
 tags:
 - generated_from_trainer
+- invoice-processing
+- information-extraction
+- czech-language
+- document-ai
+- layout-aware-model
+- synthetic-data
+- layout-augmentation
 metrics:
 - precision
 - recall
 - f1
 - accuracy
 model-index:
+- name: LiLTInvoiceCzech-V1
   results: []
 ---
+# LiLTInvoiceCzech (V1 – Synthetic + Random Layout)
+This model is a fine-tuned version of [SCUT-DLVCLab/lilt-roberta-en-base](https://huggingface.co/SCUT-DLVCLab/lilt-roberta-en-base) for structured information extraction from Czech invoices.
 It achieves the following results on the evaluation set:
+- Loss: 0.1907
+- Precision: 0.6326
+- Recall: 0.7491
+- F1: 0.6859
+- Accuracy: 0.9660
+---
 ## Model description
+LiLTInvoiceCzech (V1) extends the baseline layout-aware model by introducing layout variability into the training data.
+The model performs token-level classification using both textual and spatial (bounding box) information to extract structured invoice fields:
+- supplier
+- customer
+- invoice number
+- bank details
+- totals
+- dates
+Compared to V0, this version is trained on synthetically generated invoices with **randomized layouts**, improving robustness to spatial variations.
+---
+## Training data
+The dataset consists of:
+- synthetically generated invoices based on templates
+- augmented variants with randomized layout structures
+- corresponding bounding box annotations
+Key properties:
+- variable positioning of fields
+- layout perturbations (shifts, spacing, ordering)
+- preserved label consistency
+- fully synthetic data
+This dataset introduces **layout diversity**, which is especially important for layout-aware models.
+---
+## Role in the pipeline
+This model corresponds to:
+**V1 – Synthetic templates + randomized layouts**
+It is used to:
+- evaluate the effect of layout variability on LiLT
+- compare against:
+  - V0 (fixed layouts)
+  - later stages with hybrid and real data (V2, V3)
+- analyze how layout-aware models benefit from synthetic augmentation
+---
+## Intended uses
+- Research in layout-aware document understanding
+- Evaluation of spatial robustness in NLP models
+- Benchmarking LiLT against text-only models (BERT)
+- Czech invoice information extraction
+---
+## Limitations
+- Still trained only on synthetic data
+- Layout variability is artificial
+- No real-world noise (OCR errors, distortions)
+- May not fully generalize to real invoice distributions
+---
 ## Training procedure
 - num_epochs: 10
 - mixed_precision_training: Native AMP
+---
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
 | No log        | 9.0   | 342  | 0.1939          | 0.6700    | 0.6962 | 0.6828 | 0.9701   |
 | No log        | 10.0  | 380  | 0.1931          | 0.6645    | 0.6928 | 0.6784 | 0.9696   |
+---
+## Framework versions
+- Transformers 5.0.0
+- PyTorch 2.10.0+cu128
+- Datasets 4.0.0
+- Tokenizers 0.22.2