LayoutLMv3InvoiceCzech (V3 – Full Pipeline with Real Data Fine-Tuning)
This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.
It achieves the following results on the evaluation set:
- Loss: 0.0390
- Precision: 0.9118
- Recall: 0.9272
- F1: 0.9195
- Accuracy: 0.9924
Model description
LayoutLMv3InvoiceCzech (V3) is the final and best-performing multimodal model in the experimental pipeline.
The model combines:
- textual features
- spatial layout (bounding boxes)
- visual features (image embeddings)
to perform token-level classification and extract structured invoice fields:
- supplier
- customer
- invoice number
- bank details
- totals
- dates
By leveraging both multimodal inputs and realistic training data, this version achieves state-of-the-art performance within the experimental setup.
Training data
The dataset used in this stage combines:
- Synthetic template-based invoices (V0)
- Synthetic invoices with randomized layouts (V1)
- Hybrid invoices with real layouts and synthetic content (V2)
- Real annotated invoices
Real data fine-tuning
The final stage introduces:
- real invoice documents
- annotated entity labels
- natural linguistic variability
- real formatting inconsistencies and visual noise
This enables the model to:
- fully exploit visual and spatial features
- adapt to real-world distributions
- achieve strong generalization
Role in the pipeline
This model corresponds to:
V3 – Full pipeline (synthetic + hybrid + real data fine-tuning)
It represents:
- the final multimodal model
- the best-performing configuration across all architectures
- a production-ready solution for document understanding
Intended uses
- Real-world invoice information extraction
- Multimodal document AI systems
- OCR post-processing with visual and layout awareness
- Benchmarking advanced document understanding models
Limitations
- Depends on OCR and bounding box quality
- Computationally more expensive than text-only models
- May struggle with:
- extremely degraded scans
- rare document formats
- Domain-specific (Czech invoices)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|
| No log | 1.0 | 23 | 0.0528 | 0.8268 | 0.8883 | 0.8564 | 0.9867 |
| No log | 2.0 | 46 | 0.0445 | 0.8855 | 0.8765 | 0.8810 | 0.9893 |
| No log | 3.0 | 69 | 0.0395 | 0.8933 | 0.9069 | 0.9001 | 0.9906 |
| No log | 4.0 | 92 | 0.0354 | 0.9003 | 0.9323 | 0.9160 | 0.9924 |
| No log | 5.0 | 115 | 0.0398 | 0.9161 | 0.9052 | 0.9106 | 0.9920 |
| No log | 6.0 | 138 | 0.0400 | 0.9034 | 0.9019 | 0.9026 | 0.9916 |
| No log | 7.0 | 161 | 0.0381 | 0.9062 | 0.9323 | 0.9191 | 0.9925 |
| No log | 8.0 | 184 | 0.0404 | 0.9005 | 0.9188 | 0.9095 | 0.9922 |
| No log | 9.0 | 207 | 0.0390 | 0.9056 | 0.9255 | 0.9155 | 0.9924 |
| No log | 10.0 | 230 | 0.0390 | 0.9118 | 0.9272 | 0.9195 | 0.9924 |
Framework versions
- Transformers 5.0.0
- PyTorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 205
Model tree for TomasFAV/Layoutlmv3InvoiceCzechV0123WORSEF1
Base model
microsoft/layoutlmv3-base