DonutInvoiceCzech (V2 – Synthetic + Real Layout Injection, skipping V1)

This model is a fine-tuned version of naver-clova-ix/donut-base-finetuned-cord-v2 for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.5444
  • Mean Accuracy: 0.9187
  • F1: 0.8156

Model description

DonutInvoiceCzech (V2) is an OCR-free generative model for document understanding.

The model:

  • processes full document images
  • directly generates structured outputs
  • does not require OCR preprocessing

It extracts key invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

This version introduces real layout injection, significantly improving performance compared to purely synthetic training.


Training data

The dataset consists of:

  1. Synthetic template-based invoices (V0)
  2. Hybrid invoices with real layouts and synthetic content (V2)

⚠️ Important:
The intermediate step with randomized layouts (V1) was intentionally excluded, as it led to worse performance than the baseline (V0).

Real layout injection

In the hybrid dataset:

  • real invoice layouts are used
  • original content is replaced with synthetic data
  • new structured content is rendered into authentic layouts

This preserves:

  • real-world visual structure
  • formatting patterns
  • spatial relationships

while maintaining:

  • annotation control
  • consistent output format

Role in the pipeline

This model corresponds to:

V2 – Synthetic + real layout injection (without V1)

It demonstrates:

  • that not all augmentation strategies are beneficial
  • the importance of architecture-aware data design
  • that realistic layouts are more valuable than randomized ones for Donut

Intended uses

  • OCR-free invoice information extraction
  • End-to-end document understanding
  • Evaluation of hybrid data strategies
  • Research in generative document models

Limitations

  • Still relies on synthetic textual content
  • Sensitive to output formatting and decoding
  • Requires structured post-processing
  • Performance depends on visual similarity to training data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Mean Accuracy F1
0.1269 1.0 230 0.4448 0.8752 0.7441
0.0665 2.0 460 0.4200 0.9051 0.7892
0.0255 3.0 690 0.4321 0.8992 0.7726
0.0369 4.0 920 0.5258 0.8571 0.7485
0.0168 5.0 1150 0.5758 0.8438 0.7530
0.0146 6.0 1380 0.5634 0.8843 0.7848
0.0035 7.0 1610 0.5542 0.8915 0.7947
0.0020 8.0 1840 0.5156 0.9160 0.8090
0.0009 9.0 2070 0.5367 0.9084 0.8000
0.0014 10.0 2300 0.5444 0.9187 0.8156

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
85
Safetensors
Model size
0.2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/DonutInvoiceCzechV012

Finetuned
(40)
this model