LayoutLMv3InvoiceCzech (V3 – Full Pipeline with Real Data Fine-Tuning)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.0390
  • Precision: 0.9118
  • Recall: 0.9272
  • F1: 0.9195
  • Accuracy: 0.9924

Model description

LayoutLMv3InvoiceCzech (V3) is the final and best-performing multimodal model in the experimental pipeline.

The model combines:

  • textual features
  • spatial layout (bounding boxes)
  • visual features (image embeddings)

to perform token-level classification and extract structured invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

By leveraging both multimodal inputs and realistic training data, this version achieves state-of-the-art performance within the experimental setup.


Training data

The dataset used in this stage combines:

  1. Synthetic template-based invoices (V0)
  2. Synthetic invoices with randomized layouts (V1)
  3. Hybrid invoices with real layouts and synthetic content (V2)
  4. Real annotated invoices

Real data fine-tuning

The final stage introduces:

  • real invoice documents
  • annotated entity labels
  • natural linguistic variability
  • real formatting inconsistencies and visual noise

This enables the model to:

  • fully exploit visual and spatial features
  • adapt to real-world distributions
  • achieve strong generalization

Role in the pipeline

This model corresponds to:

V3 – Full pipeline (synthetic + hybrid + real data fine-tuning)

It represents:

  • the final multimodal model
  • the best-performing configuration across all architectures
  • a production-ready solution for document understanding

Intended uses

  • Real-world invoice information extraction
  • Multimodal document AI systems
  • OCR post-processing with visual and layout awareness
  • Benchmarking advanced document understanding models

Limitations

  • Depends on OCR and bounding box quality
  • Computationally more expensive than text-only models
  • May struggle with:
    • extremely degraded scans
    • rare document formats
  • Domain-specific (Czech invoices)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
No log 1.0 23 0.0528 0.8268 0.8883 0.8564 0.9867
No log 2.0 46 0.0445 0.8855 0.8765 0.8810 0.9893
No log 3.0 69 0.0395 0.8933 0.9069 0.9001 0.9906
No log 4.0 92 0.0354 0.9003 0.9323 0.9160 0.9924
No log 5.0 115 0.0398 0.9161 0.9052 0.9106 0.9920
No log 6.0 138 0.0400 0.9034 0.9019 0.9026 0.9916
No log 7.0 161 0.0381 0.9062 0.9323 0.9191 0.9925
No log 8.0 184 0.0404 0.9005 0.9188 0.9095 0.9922
No log 9.0 207 0.0390 0.9056 0.9255 0.9155 0.9924
No log 10.0 230 0.0390 0.9118 0.9272 0.9195 0.9924

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
205
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV0123WORSEF1

Finetuned
(297)
this model