|
|
--- |
|
|
base_model: PaddlePaddle/PaddleOCR-VL |
|
|
library_name: peft |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-text-to-text |
|
|
language: |
|
|
- pl |
|
|
tags: |
|
|
- ocr |
|
|
- lora |
|
|
- transformers |
|
|
- polish |
|
|
- document-ai |
|
|
- vision-language |
|
|
datasets: |
|
|
- synthetic-polish-ocr |
|
|
--- |
|
|
|
|
|
# RysOCR - Polish OCR LoRA for PaddleOCR-VL |
|
|
|
|
|
A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż). |
|
|
|
|
|
## Motivation |
|
|
|
|
|
Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting: |
|
|
- `ą` → `a` |
|
|
- `ę` → `e` |
|
|
- `ł` → `l` or `t` |
|
|
- `ó` → `o` |
|
|
- etc. |
|
|
|
|
|
This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) | |
|
|
| Method | LoRA (Low-Rank Adaptation) | |
|
|
| LoRA Rank | 16 | |
|
|
| LoRA Alpha | 32 | |
|
|
| Target Modules | q_proj, k_proj, v_proj, o_proj | |
|
|
| Training Framework | PEFT 0.18.0 + Transformers | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoProcessor |
|
|
from peft import PeftModel |
|
|
from PIL import Image |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"PaddlePaddle/PaddleOCR-VL", |
|
|
trust_remote_code=True, |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR") |
|
|
|
|
|
processor = AutoProcessor.from_pretrained( |
|
|
"anon13370/RysOCR", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Run inference |
|
|
image = Image.open("your_document.png") |
|
|
prompt = "OCR: " |
|
|
|
|
|
inputs = processor(images=image, text=prompt, return_tensors="pt") |
|
|
inputs = {k: v.to(model.device) for k, v in inputs.items()} |
|
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
|
text = processor.decode(outputs[0], skip_special_tokens=True) |
|
|
print(text) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Data**: 10,000 synthetic Polish document images |
|
|
- **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases |
|
|
- **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM) |
|
|
- **Epochs**: 1 epoch over full dataset |
|
|
- **Optimizer**: AdamW with linear learning rate schedule |
|
|
|
|
|
## Baseline Performance (Pre-Fine-Tuning) |
|
|
|
|
|
Baseline PaddleOCR-VL performance on Polish test set: |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Character Error Rate (CER) | 5.58% | |
|
|
| Word Error Rate (WER) | 13.37% | |
|
|
| Exact Match | 74.00% | |
|
|
| Diacritic Accuracy | 74.14% | |
|
|
|
|
|
Improved version: |
|
|
Summary: |
|
|
| | Baseline | Fine-tuned | |
|
|
|-------|----------|------------| |
|
|
| CER | 5.58% | 1.60% | |
|
|
| WER | 13.37% | 7.21% | |
|
|
| Exact | 74% | 76% | |
|
|
|
|
|
Key diacritic confusions in baseline: |
|
|
- `ł` frequently confused with `l` or `t` |
|
|
- `ę` sometimes rendered as `e` |
|
|
- `ś` confused with `š` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for printed Polish text; handwritten recognition may vary |
|
|
- Best results on clean document scans; heavily degraded images may still have errors |
|
|
- Inference requires loading both base model and LoRA weights |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (same as base model) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{rysocr2024, |
|
|
title={RysOCR: Polish OCR LoRA for PaddleOCR-VL}, |
|
|
author={Kacper Wikieł}, |
|
|
year={2024}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/anon13370/RysOCR} |
|
|
} |
|
|
``` |
|
|
|