README.md · TomasFAV/Layoutlmv3InvoiceCzechV0 at main

File size: 4,188 Bytes

---
library_name: transformers
license: cc-by-nc-sa-4.0
base_model: microsoft/layoutlmv3-base
tags:
- generated_from_trainer
- invoice-processing
- information-extraction
- czech-language
- document-ai
- layout-aware-model
- multimodal-model
- synthetic-data
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: LayoutLMv3InvoiceCzech-V0
  results: []
---

# LayoutLMv3InvoiceCzech (V0 – Synthetic Templates Only)

This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:
- Loss: 0.2146  
- Precision: 0.5354  
- Recall: 0.7428  
- F1: 0.6223  
- Accuracy: 0.9583  

---

## Model description

LayoutLMv3InvoiceCzech (V0) is a multimodal document understanding model that leverages:

- textual information  
- spatial layout (bounding boxes)  
- visual features (image embeddings)  

The model performs token-level classification to extract structured invoice fields:
- supplier  
- customer  
- invoice number  
- bank details  
- totals  
- dates  

This version is trained exclusively on synthetically generated invoice templates.

---

## Training data

The dataset consists of:

- synthetically generated invoices  
- fixed template layouts  
- corresponding bounding boxes  
- rendered document images  

Key properties:
- consistent structure across samples  
- clean and noise-free data  
- perfect alignment between text, layout, and image  
- no real-world documents  

This represents the **baseline dataset** for multimodal document models.

---

## Role in the pipeline

This model corresponds to:

**V0 – Synthetic template-based dataset only**

It is used to:
- establish a baseline for multimodal models  
- compare against:
  - text-only models (BERT)  
  - layout-aware models without vision (LiLT)  
- evaluate the contribution of visual features in a controlled setting  

---

## Intended uses

- Research in multimodal document understanding  
- Benchmarking LayoutLMv3 on structured documents  
- Comparison with other architectures (BERT, LiLT, etc.)  
- Czech invoice information extraction  

---

## Limitations

- Trained only on synthetic data with fixed layouts  
- Limited generalization to real-world invoices  
- Visual features are learned from clean synthetic renderings  
- No exposure to:
  - OCR errors  
  - scanning artifacts  
  - real-world noise  

---

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP

---

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log        | 1.0   | 150  | 0.2817          | 0.1429    | 0.0829 | 0.1049 | 0.9470   |
| No log        | 2.0   | 300  | 0.2222          | 0.3480    | 0.4822 | 0.4043 | 0.9480   |
| No log        | 3.0   | 450  | 0.2170          | 0.3852    | 0.5736 | 0.4609 | 0.9480   |
| 0.5287        | 4.0   | 600  | 0.1919          | 0.4625    | 0.6261 | 0.5320 | 0.9558   |
| 0.5287        | 5.0   | 750  | 0.1701          | 0.5254    | 0.7174 | 0.6066 | 0.9627   |
| 0.5287        | 6.0   | 900  | 0.2060          | 0.5173    | 0.7327 | 0.6064 | 0.9565   |
| 0.0360        | 7.0   | 1050 | 0.2161          | 0.5370    | 0.7124 | 0.6124 | 0.9594   |
| 0.0360        | 8.0   | 1200 | 0.2146          | 0.5359    | 0.7445 | 0.6232 | 0.9584   |
| 0.0360        | 9.0   | 1350 | 0.2141          | 0.5268    | 0.7327 | 0.6129 | 0.9578   |
| 0.0147        | 10.0  | 1500 | 0.2131          | 0.5393    | 0.7310 | 0.6207 | 0.9597   |

---

## Framework versions

- Transformers 5.0.0  
- PyTorch 2.10.0+cu128  
- Datasets 4.0.0  
- Tokenizers 0.22.2