DonutInvoiceCzech (V1 – Synthetic + Random Layout)
This model is a fine-tuned version of naver-clova-ix/donut-base-finetuned-cord-v2 for structured information extraction from Czech invoices.
It achieves the following results on the evaluation set:
- Loss: 0.9105
- Mean Accuracy: 0.6762
- F1: 0.6343
Model description
DonutInvoiceCzech (V1) extends the baseline OCR-free generative model by introducing layout variability into the training data.
The model:
- processes raw document images
- generates structured outputs directly
- does not rely on OCR
It extracts key invoice fields:
- supplier
- customer
- invoice number
- bank details
- totals
- dates
Training data
The dataset consists of:
- synthetically generated invoice images
- augmented variants with randomized layouts
- corresponding structured output sequences
Key properties:
- increased visual variability
- layout perturbations (positions, spacing, formatting)
- consistent annotation schema
- fully synthetic data
Role in the pipeline
This model corresponds to:
V1 – Synthetic templates + randomized layouts
It is used to:
- evaluate the effect of layout variability on OCR-free models
- compare against:
- V0 (fixed templates)
- later hybrid and real-data stages (V2, V3)
- analyze robustness of generative extraction
Intended uses
- OCR-free invoice extraction
- End-to-end document understanding
- Research in generative document models
- Comparison with Pix2Struct and encoder-based approaches
Limitations
- Performance degradation compared to V0
- High sensitivity to layout perturbations
- Training instability and overfitting
- Output formatting inconsistencies
- No exposure to real-world data
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 9e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Mean Accuracy | F1 |
|---|---|---|---|---|---|
| 0.0797 | 1.0 | 150 | 0.7222 | 0.6492 | 0.5823 |
| 0.0463 | 2.0 | 300 | 0.8908 | 0.6174 | 0.5368 |
| 0.0348 | 3.0 | 450 | 0.8566 | 0.6378 | 0.5441 |
| 0.0270 | 4.0 | 600 | 0.9105 | 0.6762 | 0.6343 |
| 0.0110 | 5.0 | 750 | 0.9557 | 0.6509 | 0.5779 |
| 0.0075 | 6.0 | 900 | 0.9636 | 0.6478 | 0.5471 |
| 0.0046 | 7.0 | 1050 | 1.1133 | 0.6161 | 0.5089 |
| 0.0081 | 8.0 | 1200 | 1.1268 | 0.5945 | 0.4944 |
| 0.0027 | 9.0 | 1350 | 1.1608 | 0.5705 | 0.4736 |
| 0.0040 | 10.0 | 1500 | 1.1872 | 0.5610 | 0.4748 |
Framework versions
- Transformers 5.0.0
- PyTorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 301
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for TomasFAV/DonutInvoiceCzechV01
Base model
naver-clova-ix/donut-base-finetuned-cord-v2