--- base_model: PaddlePaddle/PaddleOCR-VL library_name: peft license: apache-2.0 pipeline_tag: image-text-to-text language: - pl tags: - ocr - lora - transformers - polish - document-ai - vision-language datasets: - synthetic-polish-ocr --- # RysOCR - Polish OCR LoRA for PaddleOCR-VL A LoRA adapter fine-tuned on PaddleOCR-VL specifically for **Polish text recognition**, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż). ## Motivation Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting: - `ą` → `a` - `ę` → `e` - `ł` → `l` or `t` - `ó` → `o` - etc. This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases. ## Model Details | Property | Value | |----------|-------| | Base Model | [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) | | Method | LoRA (Low-Rank Adaptation) | | LoRA Rank | 16 | | LoRA Alpha | 32 | | Target Modules | q_proj, k_proj, v_proj, o_proj | | Training Framework | PEFT 0.18.0 + Transformers | ## Usage ```python from transformers import AutoModelForCausalLM, AutoProcessor from peft import PeftModel from PIL import Image # Load base model base_model = AutoModelForCausalLM.from_pretrained( "PaddlePaddle/PaddleOCR-VL", trust_remote_code=True, torch_dtype="auto", device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR") processor = AutoProcessor.from_pretrained( "anon13370/RysOCR", trust_remote_code=True ) # Run inference image = Image.open("your_document.png") prompt = "OCR: " inputs = processor(images=image, text=prompt, return_tensors="pt") inputs = {k: v.to(model.device) for k, v in inputs.items()} outputs = model.generate(**inputs, max_new_tokens=256) text = processor.decode(outputs[0], skip_special_tokens=True) print(text) ``` ## Training Details - **Training Data**: 10,000 synthetic Polish document images - **Categories**: Addresses, invoice lines, receipt lines, dates, names, prices, phrases - **Hardware**: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM) - **Epochs**: 1 epoch over full dataset - **Optimizer**: AdamW with linear learning rate schedule ## Baseline Performance (Pre-Fine-Tuning) Baseline PaddleOCR-VL performance on Polish test set: | Metric | Value | |--------|-------| | Character Error Rate (CER) | 5.58% | | Word Error Rate (WER) | 13.37% | | Exact Match | 74.00% | | Diacritic Accuracy | 74.14% | Improved version: Summary: | | Baseline | Fine-tuned | |-------|----------|------------| | CER | 5.58% | 1.60% | | WER | 13.37% | 7.21% | | Exact | 74% | 76% | Key diacritic confusions in baseline: - `ł` frequently confused with `l` or `t` - `ę` sometimes rendered as `e` - `ś` confused with `š` ## Limitations - Optimized for printed Polish text; handwritten recognition may vary - Best results on clean document scans; heavily degraded images may still have errors - Inference requires loading both base model and LoRA weights ## License Apache 2.0 (same as base model) ## Citation If you use this model, please cite: ```bibtex @misc{rysocr2024, title={RysOCR: Polish OCR LoRA for PaddleOCR-VL}, author={Kacper Wikieł}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/anon13370/RysOCR} } ```