--- license: apache-2.0 language: - he - bn - gu - kn - ml tags: - ocr - text-recognition - paddleocr - mnn pipeline_tag: image-to-text --- # PP-OCRv6 fine-tuned recognizers for Hebrew + Indic This is a fine-tune of PP-OCRv6 'small', one for Hebrew, one for (Bengali, Gujarati, Kannada, Malayalam). Both have Latin as well. Hebrew does not do Niqqud. Trained exclusively on synthetic data, evaluated against 3 pictures, was better than Tesseract. - Input strip height is **48**; output is already softmax (per-char confidence = max prob). - Emits glyphs in visual (left-to-right) order (need reversal logic for Hebrew) ## Training code `scripts/rec_model/` in [translator-rs](https://github.com/DavidVentura/translator-rs). ## License Fine-tune of PP-OCRv6 (Apache-2.0). Synthetic training data rendered with mixed-license fonts (Culmus, Google Fonts OFL, SIL) over Leipzig corpora.