File size: 3,524 Bytes
843f045
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36b9aa5
 
 
 
 
 
 
 
843f045
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36b9aa5
843f045
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
base_model: PaddlePaddle/PaddleOCR-VL
library_name: peft
license: apache-2.0
pipeline_tag: image-text-to-text
language:
- pl
tags:
- ocr
- lora
- transformers
- polish
- document-ai
- vision-language
datasets:
- synthetic-polish-ocr
---

# RysOCR - Polish OCR LoRA for PaddleOCR-VL

A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).

## Motivation

Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
- `ą``a`
- `ę``e`
- `ł``l` or `t`
- `ó``o`
- etc.

This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.

## Model Details

| Property | Value |
|----------|-------|
| Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Framework | PEFT 0.18.0 + Transformers |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "PaddlePaddle/PaddleOCR-VL",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")

processor = AutoProcessor.from_pretrained(
    "anon13370/RysOCR",
    trust_remote_code=True
)

# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "

inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)
```

## Training Details

- **Training Data**: 10,000 synthetic Polish document images
- **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
- **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
- **Epochs**: 1 epoch over full dataset
- **Optimizer**: AdamW with linear learning rate schedule

## Baseline Performance (Pre-Fine-Tuning)

Baseline PaddleOCR-VL performance on Polish test set:

| Metric | Value |
|--------|-------|
| Character Error Rate (CER) | 5.58% |
| Word Error Rate (WER) | 13.37% |
| Exact Match | 74.00% |
| Diacritic Accuracy | 74.14% |

Improved version: 
 Summary:
  |       | Baseline | Fine-tuned |
  |-------|----------|------------|
  | CER   | 5.58%    | 1.60%      |
  | WER   | 13.37%   | 7.21%      |
  | Exact | 74%      | 76%        |

Key diacritic confusions in baseline:
- `ł` frequently confused with `l` or `t`
- `ę` sometimes rendered as `e`
- `ś` confused with `š`

## Limitations

- Optimized for printed Polish text; handwritten recognition may vary
- Best results on clean document scans; heavily degraded images may still have errors
- Inference requires loading both base model and LoRA weights

## License

Apache 2.0 (same as base model)

## Citation

If you use this model, please cite:

```bibtex
@misc{rysocr2024,
  title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
  author={Kacper Wikieł},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/anon13370/RysOCR}
}
```