RysOCR / README.md
anon
Update README.md
36b9aa5 verified
---
base_model: PaddlePaddle/PaddleOCR-VL
library_name: peft
license: apache-2.0
pipeline_tag: image-text-to-text
language:
- pl
tags:
- ocr
- lora
- transformers
- polish
- document-ai
- vision-language
datasets:
- synthetic-polish-ocr
---
# RysOCR - Polish OCR LoRA for PaddleOCR-VL
A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).
## Motivation
Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
- `ą``a`
- `ę``e`
- `ł``l` or `t`
- `ó``o`
- etc.
This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.
## Model Details
| Property | Value |
|----------|-------|
| Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Framework | PEFT 0.18.0 + Transformers |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"PaddlePaddle/PaddleOCR-VL",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")
processor = AutoProcessor.from_pretrained(
"anon13370/RysOCR",
trust_remote_code=True
)
# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "
inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)
```
## Training Details
- **Training Data**: 10,000 synthetic Polish document images
- **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
- **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
- **Epochs**: 1 epoch over full dataset
- **Optimizer**: AdamW with linear learning rate schedule
## Baseline Performance (Pre-Fine-Tuning)
Baseline PaddleOCR-VL performance on Polish test set:
| Metric | Value |
|--------|-------|
| Character Error Rate (CER) | 5.58% |
| Word Error Rate (WER) | 13.37% |
| Exact Match | 74.00% |
| Diacritic Accuracy | 74.14% |
Improved version:
Summary:
| | Baseline | Fine-tuned |
|-------|----------|------------|
| CER | 5.58% | 1.60% |
| WER | 13.37% | 7.21% |
| Exact | 74% | 76% |
Key diacritic confusions in baseline:
- `ł` frequently confused with `l` or `t`
- `ę` sometimes rendered as `e`
- `ś` confused with `š`
## Limitations
- Optimized for printed Polish text; handwritten recognition may vary
- Best results on clean document scans; heavily degraded images may still have errors
- Inference requires loading both base model and LoRA weights
## License
Apache 2.0 (same as base model)
## Citation
If you use this model, please cite:
```bibtex
@misc{rysocr2024,
title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
author={Kacper Wikieł},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/anon13370/RysOCR}
}
```