File size: 3,524 Bytes
843f045 36b9aa5 843f045 36b9aa5 843f045 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
base_model: PaddlePaddle/PaddleOCR-VL
library_name: peft
license: apache-2.0
pipeline_tag: image-text-to-text
language:
- pl
tags:
- ocr
- lora
- transformers
- polish
- document-ai
- vision-language
datasets:
- synthetic-polish-ocr
---
# RysOCR - Polish OCR LoRA for PaddleOCR-VL
A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).
## Motivation
Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
- `ą` → `a`
- `ę` → `e`
- `ł` → `l` or `t`
- `ó` → `o`
- etc.
This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.
## Model Details
| Property | Value |
|----------|-------|
| Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Framework | PEFT 0.18.0 + Transformers |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"PaddlePaddle/PaddleOCR-VL",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")
processor = AutoProcessor.from_pretrained(
"anon13370/RysOCR",
trust_remote_code=True
)
# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "
inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)
```
## Training Details
- **Training Data**: 10,000 synthetic Polish document images
- **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
- **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
- **Epochs**: 1 epoch over full dataset
- **Optimizer**: AdamW with linear learning rate schedule
## Baseline Performance (Pre-Fine-Tuning)
Baseline PaddleOCR-VL performance on Polish test set:
| Metric | Value |
|--------|-------|
| Character Error Rate (CER) | 5.58% |
| Word Error Rate (WER) | 13.37% |
| Exact Match | 74.00% |
| Diacritic Accuracy | 74.14% |
Improved version:
Summary:
| | Baseline | Fine-tuned |
|-------|----------|------------|
| CER | 5.58% | 1.60% |
| WER | 13.37% | 7.21% |
| Exact | 74% | 76% |
Key diacritic confusions in baseline:
- `ł` frequently confused with `l` or `t`
- `ę` sometimes rendered as `e`
- `ś` confused with `š`
## Limitations
- Optimized for printed Polish text; handwritten recognition may vary
- Best results on clean document scans; heavily degraded images may still have errors
- Inference requires loading both base model and LoRA weights
## License
Apache 2.0 (same as base model)
## Citation
If you use this model, please cite:
```bibtex
@misc{rysocr2024,
title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
author={Kacper Wikieł},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/anon13370/RysOCR}
}
```
|