DonutInvoiceCzech (V1 – Synthetic + Random Layout)

This model is a fine-tuned version of naver-clova-ix/donut-base-finetuned-cord-v2 for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.9105
  • Mean Accuracy: 0.6762
  • F1: 0.6343

Model description

DonutInvoiceCzech (V1) extends the baseline OCR-free generative model by introducing layout variability into the training data.

The model:

  • processes raw document images
  • generates structured outputs directly
  • does not rely on OCR

It extracts key invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

Training data

The dataset consists of:

  • synthetically generated invoice images
  • augmented variants with randomized layouts
  • corresponding structured output sequences

Key properties:

  • increased visual variability
  • layout perturbations (positions, spacing, formatting)
  • consistent annotation schema
  • fully synthetic data

Role in the pipeline

This model corresponds to:

V1 – Synthetic templates + randomized layouts

It is used to:

  • evaluate the effect of layout variability on OCR-free models
  • compare against:
    • V0 (fixed templates)
    • later hybrid and real-data stages (V2, V3)
  • analyze robustness of generative extraction

Intended uses

  • OCR-free invoice extraction
  • End-to-end document understanding
  • Research in generative document models
  • Comparison with Pix2Struct and encoder-based approaches

Limitations

  • Performance degradation compared to V0
  • High sensitivity to layout perturbations
  • Training instability and overfitting
  • Output formatting inconsistencies
  • No exposure to real-world data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Mean Accuracy F1
0.0797 1.0 150 0.7222 0.6492 0.5823
0.0463 2.0 300 0.8908 0.6174 0.5368
0.0348 3.0 450 0.8566 0.6378 0.5441
0.0270 4.0 600 0.9105 0.6762 0.6343
0.0110 5.0 750 0.9557 0.6509 0.5779
0.0075 6.0 900 0.9636 0.6478 0.5471
0.0046 7.0 1050 1.1133 0.6161 0.5089
0.0081 8.0 1200 1.1268 0.5945 0.4944
0.0027 9.0 1350 1.1608 0.5705 0.4736
0.0040 10.0 1500 1.1872 0.5610 0.4748

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
301
Safetensors
Model size
0.2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/DonutInvoiceCzechV01

Finetuned
(40)
this model