kacperwikiel
/

RysOCR

Image-Text-to-Text

vision-language

Model card Files Files and versions

RysOCR / README.md

anon

Update README.md

36b9aa5 verified 18 days ago

|

history blame contribute delete

3.52 kB

	---
	base_model: PaddlePaddle/PaddleOCR-VL
	library_name: peft
	license: apache-2.0
	pipeline_tag: image-text-to-text
	language:
	- pl
	tags:
	- ocr
	- lora
	- transformers
	- polish
	- document-ai
	- vision-language
	datasets:
	- synthetic-polish-ocr
	---

	# RysOCR - Polish OCR LoRA for PaddleOCR-VL

	A LoRA adapter fine-tuned on PaddleOCR-VL specifically for Polish text recognition, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).

	## Motivation

	Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
	- `ą` → `a`
	- `ę` → `e`
	- `ł` → `l` or `t`
	- `ó` → `o`
	- etc.

	This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [PaddlePaddle/PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) \|
	\| Method \| LoRA (Low-Rank Adaptation) \|
	\| LoRA Rank \| 16 \|
	\| LoRA Alpha \| 32 \|
	\| Target Modules \| q_proj, k_proj, v_proj, o_proj \|
	\| Training Framework \| PEFT 0.18.0 + Transformers \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoProcessor
	from peft import PeftModel
	from PIL import Image

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"PaddlePaddle/PaddleOCR-VL",
	trust_remote_code=True,
	torch_dtype="auto",
	device_map="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")

	processor = AutoProcessor.from_pretrained(
	"anon13370/RysOCR",
	trust_remote_code=True
	)

	# Run inference
	image = Image.open("your_document.png")
	prompt = "OCR: "

	inputs = processor(images=image, text=prompt, return_tensors="pt")
	inputs = {k: v.to(model.device) for k, v in inputs.items()}

	outputs = model.generate(**inputs, max_new_tokens=256)
	text = processor.decode(outputs[0], skip_special_tokens=True)
	print(text)
	```

	## Training Details

	- Training Data: 10,000 synthetic Polish document images
	- Categories: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
	- Hardware: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
	- Epochs: 1 epoch over full dataset
	- Optimizer: AdamW with linear learning rate schedule

	## Baseline Performance (Pre-Fine-Tuning)

	Baseline PaddleOCR-VL performance on Polish test set:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Character Error Rate (CER) \| 5.58% \|
	\| Word Error Rate (WER) \| 13.37% \|
	\| Exact Match \| 74.00% \|
	\| Diacritic Accuracy \| 74.14% \|

	Improved version:
	Summary:
	\| \| Baseline \| Fine-tuned \|
	\|-------\|----------\|------------\|
	\| CER \| 5.58% \| 1.60% \|
	\| WER \| 13.37% \| 7.21% \|
	\| Exact \| 74% \| 76% \|

	Key diacritic confusions in baseline:
	- `ł` frequently confused with `l` or `t`
	- `ę` sometimes rendered as `e`
	- `ś` confused with `š`

	## Limitations

	- Optimized for printed Polish text; handwritten recognition may vary
	- Best results on clean document scans; heavily degraded images may still have errors
	- Inference requires loading both base model and LoRA weights

	## License

	Apache 2.0 (same as base model)

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{rysocr2024,
	title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
	author={Kacper Wikieł},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/anon13370/RysOCR}
	}
	```