kacperwikiel
/

polish-ocr-lora-broken

Image-Text-to-Text

Model card Files Files and versions

polish-ocr-lora-broken / README.md

anon

Upload folder using huggingface_hub

ceb5cc2 verified 19 days ago

|

history blame contribute delete

1.67 kB

	---
	base_model: PaddlePaddle/PaddleOCR-VL
	library_name: peft
	pipeline_tag: image-text-to-text
	language:
	- pl
	tags:
	- base_model:adapter:PaddlePaddle/PaddleOCR-VL
	- lora
	- transformers
	- ocr
	- polish
	- experimental
	- broken
	license: apache-2.0
	---

	# Polish OCR LoRA (BROKEN - DO NOT USE IN PRODUCTION)

	WARNING: This model is broken and produces garbage output. It is uploaded for archival/experimental purposes only.

	## Status: FAILED EXPERIMENT

	This is a LoRA adapter fine-tuned on PaddleOCR-VL for Polish OCR tasks. The training did not converge properly and the model outputs are unreliable.

	### Known Issues

	- Model produces hallucinated text
	- Poor accuracy on Polish characters (especially: a, e, o, s, z, n, c, l)
	- Inconsistent output quality
	- May output repetitive or nonsensical text

	## Model Details

	- Base Model: PaddlePaddle/PaddleOCR-VL
	- Adapter Type: LoRA (r=16, alpha=32)
	- Target Modules: q_proj, v_proj, o_proj, k_proj
	- Task: Polish document OCR (attempted)
	- Language: Polish
	- Training Data: Synthetic Polish OCR dataset (10k samples)
	- Framework: PEFT 0.18.0

	## Why Upload a Broken Model?

	1. Transparency: To document what doesn't work
	2. Reproducibility: Others can learn from this failure
	3. Baseline: Can be used as a negative example for benchmarking

	## Training Configuration

	```
	LoRA rank: 16
	LoRA alpha: 32
	Dropout: 0.05
	Checkpoints: 846 steps
	```

	## Do Not Use For

	- Production OCR systems
	- Any task requiring accurate text extraction
	- Anything where correctness matters

	## License

	Apache 2.0

	## Framework Versions

	- PEFT 0.18.0
	- Transformers (latest at time of training)