RysOCR

File size: 3,524 Bytes

---
base_model: PaddlePaddle/PaddleOCR-VL
library_name: peft
license: apache-2.0
pipeline_tag: image-text-to-text
language:
- pl
tags:
- ocr
- lora
- transformers
- polish
- document-ai
- vision-language
datasets:
- synthetic-polish-ocr
---

# RysOCR - Polish OCR LoRA for PaddleOCR-VL

A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).

## Motivation

Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
- `ą` → `a`
- `ę` → `e`
- `ł` → `l` or `t`
- `ó` → `o`
- etc.

This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.

## Model Details

| Property | Value |
|----------|-------|
| Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Framework | PEFT 0.18.0 + Transformers |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "PaddlePaddle/PaddleOCR-VL",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")

processor = AutoProcessor.from_pretrained(
    "anon13370/RysOCR",
    trust_remote_code=True
)

# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "

inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)
```

## Training Details

- **Training Data**: 10,000 synthetic Polish document images
- **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
- **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
- **Epochs**: 1 epoch over full dataset
- **Optimizer**: AdamW with linear learning rate schedule

## Baseline Performance (Pre-Fine-Tuning)

Baseline PaddleOCR-VL performance on Polish test set:

| Metric | Value |
|--------|-------|
| Character Error Rate (CER) | 5.58% |
| Word Error Rate (WER) | 13.37% |
| Exact Match | 74.00% |
| Diacritic Accuracy | 74.14% |

Improved version: 
 Summary:
  |       | Baseline | Fine-tuned |
  |-------|----------|------------|
  | CER   | 5.58%    | 1.60%      |
  | WER   | 13.37%   | 7.21%      |
  | Exact | 74%      | 76%        |

Key diacritic confusions in baseline:
- `ł` frequently confused with `l` or `t`
- `ę` sometimes rendered as `e`
- `ś` confused with `š`

## Limitations

- Optimized for printed Polish text; handwritten recognition may vary
- Best results on clean document scans; heavily degraded images may still have errors
- Inference requires loading both base model and LoRA weights

## License

Apache 2.0 (same as base model)

## Citation

If you use this model, please cite:

```bibtex
@misc{rysocr2024,
  title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
  author={Kacper Wikieł},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/anon13370/RysOCR}
}
```