---
library_name: transformers
license: cc-by-nc-sa-4.0
base_model: microsoft/layoutlmv3-base
tags:
- generated_from_trainer
- invoice-processing
- information-extraction
- czech-language
- document-ai
- layout-aware-model
- multimodal-model
- synthetic-data
- layout-augmentation
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: LayoutLMv3InvoiceCzech-V1
  results: []
---

# LayoutLMv3InvoiceCzech (V1 – Synthetic + Random Layout)

This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:
- Loss: 0.1750  
- Precision: 0.6800  
- Recall: 0.6904  
- F1: 0.6851  
- Accuracy: 0.9714  

---

## Model description

LayoutLMv3InvoiceCzech (V1) extends the baseline multimodal model by introducing layout variability into the training data.

The model leverages:
- textual features  
- spatial layout (bounding boxes)  
- visual features (image embeddings)  

It performs token-level classification to extract structured invoice fields:
- supplier  
- customer  
- invoice number  
- bank details  
- totals  
- dates  

Compared to V0, this version is trained on synthetically generated invoices with **randomized layouts**, improving robustness to structural variations.

---

## Training data

The dataset consists of:

- synthetically generated invoices based on templates  
- augmented variants with randomized layouts  
- corresponding bounding boxes  
- rendered document images  

Key properties:
- variable positioning of fields  
- layout perturbations (shifts, spacing, ordering)  
- preserved label consistency  
- fully synthetic data  

This dataset introduces **layout diversity** and tests how multimodal models respond to structural variability.

---

## Role in the pipeline

This model corresponds to:

**V1 – Synthetic templates + randomized layouts**

It is used to:
- evaluate the impact of layout variability on multimodal models  
- compare against:
  - V0 (fixed layouts)  
  - later hybrid and real-data stages (V2, V3)  
- analyze interaction between visual and spatial features  

---

## Intended uses

- Research in multimodal document understanding  
- Benchmarking LayoutLMv3 under layout variability  
- Comparison with BERT and LiLT  
- Czech invoice information extraction  

---

## Limitations

- Still trained only on synthetic data  
- Layout variability is artificial  
- Visual features are derived from clean renderings  
- No real-world noise (OCR errors, scanning artifacts)  

---

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP

---

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log        | 1.0   | 75   | 0.1545          | 0.6769    | 0.6701 | 0.6735 | 0.9711   |
| No log        | 2.0   | 150  | 0.1658          | 0.6732    | 0.6937 | 0.6833 | 0.9695   |
| No log        | 3.0   | 225  | 0.1750          | 0.6800    | 0.6904 | 0.6851 | 0.9714   |
| No log        | 4.0   | 300  | 0.1946          | 0.6881    | 0.6159 | 0.6500 | 0.9707   |
| No log        | 5.0   | 375  | 0.1896          | 0.6941    | 0.6717 | 0.6827 | 0.9717   |
| No log        | 6.0   | 450  | 0.1979          | 0.6609    | 0.6430 | 0.6518 | 0.9704   |
| 0.0193        | 7.0   | 525  | 0.1991          | 0.6702    | 0.6396 | 0.6545 | 0.9706   |
| 0.0193        | 8.0   | 600  | 0.2014          | 0.6503    | 0.6261 | 0.6379 | 0.9698   |
| 0.0193        | 9.0   | 675  | 0.1955          | 0.6523    | 0.6413 | 0.6468 | 0.9702   |
| 0.0193        | 10.0  | 750  | 0.1956          | 0.6535    | 0.6447 | 0.6491 | 0.9704   |

---

## Framework versions

- Transformers 5.0.0  
- PyTorch 2.10.0+cu128  
- Datasets 4.0.0  
- Tokenizers 0.22.2