gigapdf-ocr-hebrew โ€” Hebrew text-line OCR (CRNN + CTC)

A compact Hebrew text-line recognizer for the gigapdf-lib OCR engine. PaddleOCR / EasyOCR / MMOCR ship no Hebrew model, so this one was trained from scratch to fill that gap. It plugs into the same pipeline as the PaddleOCR PP-OCR recognizers (a shared DBNet detector + per-language SVTR/CRNN recognizers) running on RTen, a pure-Rust ONNX runtime (no C++, no Tesseract).

Architecture

  • CRNN + CTC: conv backbone (W/4 downsample) โ†’ height collapse โ†’ 2-layer BiLSTM (384 hidden) โ†’ CTC.
  • Input: RGB line crop, height 48, normalized (px/255 โˆ’ 0.5)/0.5, tensor [1, 3, 48, W] (dynamic width).
  • Output: [1, T, 109] CTC logits. Charlist convention: class 0 = CTC blank, classes 1..107 = the 107 dict characters (dict.txt, one per line: Hebrew block + digits + punctuation + Latin), class 108 = space.
  • RTL: a CTC model emits glyphs in visual (leftโ†’right) order. The model is trained on visual-order labels (logical text โ†’ python-bidi get_display), so at inference you reverse the decoded token sequence back to logical reading order.

Training data

  • ~10 Hebrew typefaces (Noto Serif/Rashi Hebrew, David Libre, Frank Ruhl Libre, Heebo, Rubik, Assistant, โ€ฆ).
  • Procedurally generated lines: 45% random Hebrew letter/digit sequences (letter-level coverage so the model learns letters, not memorized words โ€” this generalizes to unseen words), plus a broad Hebrew word list, digits and occasional Latin runs. Light scan-like augmentation (small rotation, gaussian noise).
  • No niqqud (printed Hebrew rarely carries it).

Files

File Use
model.onnx ONNX graph (dynamic batch/width), opset 17
model.rten Converted for the RTen runtime (rten-convert)
dict.txt 107-char alphabet, one char per line

Usage (RTen, via gigapdf-ocr-rten)

use gigapdf_ocr_rten::OcrEngine;
// Drop model.rten + dict.txt into <models_dir>/hebrew/, then:
let eng = OcrEngine::load_models_dir("models")?;
for line in eng.recognize_page(&rgb_image)? { /* line.text is logical-order Hebrew */ }

The engine flags Hebrew rtl: true and reverses the visual-order CTC output back to logical automatically.

License

PolyForm Noncommercial 1.0.0. Copyright 2025 Rony Licha / QR Communication. Commercial use requires a separate license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support