gigapdf-ocr-hebrew โ Hebrew text-line OCR (CRNN + CTC)
A compact Hebrew text-line recognizer for the gigapdf-lib
OCR engine. PaddleOCR / EasyOCR / MMOCR ship no Hebrew model, so this one was trained from scratch to
fill that gap. It plugs into the same pipeline as the PaddleOCR PP-OCR recognizers (a shared DBNet
detector + per-language SVTR/CRNN recognizers) running on RTen,
a pure-Rust ONNX runtime (no C++, no Tesseract).
Architecture
- CRNN + CTC: conv backbone (W/4 downsample) โ height collapse โ 2-layer BiLSTM (384 hidden) โ CTC.
- Input: RGB line crop, height 48, normalized
(px/255 โ 0.5)/0.5, tensor[1, 3, 48, W](dynamic width). - Output:
[1, T, 109]CTC logits. Charlist convention: class0= CTC blank, classes1..107= the 107 dict characters (dict.txt, one per line: Hebrew block + digits + punctuation + Latin), class108= space. - RTL: a CTC model emits glyphs in visual (leftโright) order. The model is trained on visual-order
labels (logical text โ
python-bidiget_display), so at inference you reverse the decoded token sequence back to logical reading order.
Training data
- ~10 Hebrew typefaces (Noto Serif/Rashi Hebrew, David Libre, Frank Ruhl Libre, Heebo, Rubik, Assistant, โฆ).
- Procedurally generated lines: 45% random Hebrew letter/digit sequences (letter-level coverage so the model learns letters, not memorized words โ this generalizes to unseen words), plus a broad Hebrew word list, digits and occasional Latin runs. Light scan-like augmentation (small rotation, gaussian noise).
- No niqqud (printed Hebrew rarely carries it).
Files
| File | Use |
|---|---|
model.onnx |
ONNX graph (dynamic batch/width), opset 17 |
model.rten |
Converted for the RTen runtime (rten-convert) |
dict.txt |
107-char alphabet, one char per line |
Usage (RTen, via gigapdf-ocr-rten)
use gigapdf_ocr_rten::OcrEngine;
// Drop model.rten + dict.txt into <models_dir>/hebrew/, then:
let eng = OcrEngine::load_models_dir("models")?;
for line in eng.recognize_page(&rgb_image)? { /* line.text is logical-order Hebrew */ }
The engine flags Hebrew rtl: true and reverses the visual-order CTC output back to logical automatically.
License
PolyForm Noncommercial 1.0.0. Copyright 2025 Rony Licha / QR Communication. Commercial use requires a separate license.