Ukrainian HTR Model (Puigcerver CRNN)

A Handwritten Text Recognition (HTR) model for 19th–20th century Ukrainian manuscripts and typewritten texts, based on the CNN + BiLSTM + CTC architecture introduced in Puigcerver (2017) and used as the backbone of PyLaia and Transkribus.

This is a clean-room PyTorch reimplementation of that published architecture (PyLaia-inspired). It does not use the PyLaia Python package and is not loadable by it — training and inference run via plain PyTorch (see Usage below).

Model Details

Architecture: CNN encoder [12, 24, 48, 48 filters] + 3-layer Bidirectional LSTM (256 units) + CTC decoder (Puigcerver 2017)
Input: Grayscale line images, normalized to 128 px height with aspect ratio preserved
Output: UTF-8 Ukrainian Cyrillic text
Vocabulary: 184 symbols (symbols.txt)
Framework: Pure PyTorch — clean-room reimplementation of the Puigcerver (2017) architecture (PyLaia-inspired); the PyLaia package is not required

Performance

Metric	Value
Validation CER	4.76%
Training epochs	86
Training lines	24,706
Training pages	773
Validation lines	970
Validation pages	28

Training Data

Trained on Ukrainian handwriting and typewritten text images transcribed and exported from Transkribus (see the corresponding Transkribus model page). The dataset covers Ukrainian manuscripts and typewritten texts from the 19th and 20th centuries.

The training data was drawn from two sources:

Prozhito Project (with the participation of Misha Melnichenko)
Foundation of the International Memorial Association (with the participation of Aren Vanyan and Nikita Lomakin)

The Transkribus model was curated and trained by Aleksej Tikhonov (MultiHTR project, University of Freiburg) as an extension of the Ukrainian generic handwriting 1 model. Model curated by Achim Rabus (Slavic Department, University of Freiburg). Funded by the Ministry of Science, Research and the Arts of Baden-Württemberg (funds from the state digitization strategy digital@bw).

Usage

Requirements

pip install torch torchvision pillow

Inference

Download best_model.pt, symbols.txt, and model_config.json from this repository, then use the inference script from polyscriptor:

from inference_pylaia_native import PyLaiaInference
from PIL import Image

# Load model
model = PyLaiaInference(
    checkpoint_path="best_model.pt",
    syms_path="symbols.txt"
)

# Transcribe a line image
image = Image.open("line_image.jpg")
text = model.transcribe(image)
print(text)

Note: Input should be a single text line image, not a full page. Preprocessing (grayscale conversion, height normalization, aspect ratio preservation) is handled automatically by inference_pylaia_native.py.

For full-page inference with automatic line segmentation, use batch_processing.py:

python batch_processing.py \
    --engine crnn-ctc \
    --model-path best_model.pt \
    --input-folder images/ \
    --output-folder output/

GUI Usage

polyscriptor also ships graphical interfaces that handle full-page processing without requiring pre-segmented line images:

Interactive single-page GUI — loads raw page images, performs automatic line segmentation, and can export results as PAGE XML:

python transcription_gui_plugin.py

Batch processing GUI — processes entire folders; auto-detects existing PAGE XML files (e.g. from Transkribus) and uses them for segmentation when available:

python polyscriptor_batch_gui.py

Intended Use

Transcription of 19th–20th century Ukrainian handwritten documents and typewritten texts
Ukrainian historical document digitization
Digital humanities research on modern Ukrainian texts

Limitations

Optimized for 19th–20th century Ukrainian script; may underperform on earlier periods or other variants
Full-page segmentation quality depends on the segmentation method used upstream

Citation

If you use this model in your research, please cite the architecture paper, the publication describing the model and datasets, and this model:

@article{tikhonov2024ukrainian,
  title   = {Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century:
             Possibilities, Challenges, and the Future of the First Generic AI-based Model},
  author  = {Tikhonov, Aleksej and Rabus, Achim},
  journal = {Kyiv-Mohyla Humanities Journal},
  volume  = {11},
  pages   = {226--247},
  year    = {2024},
  doi     = {10.18523/2313-4895.11.2024.226-247},
  url     = {https://doi.org/10.18523/2313-4895.11.2024.226-247}
}

@article{puigcerver2017multidimensional,
  title     = {Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?},
  author    = {Puigcerver, Joan},
  journal   = {Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},
  year      = {2017},
  url       = {https://www.jpuigcerver.net/pubs/jpuigcerver_icdar2017.pdf}
}

% TODO: add publication for Prozhito dataset/model
@misc{prozhito,
  title  = {Prozhito: {PLACEHOLDER}},
  author = {},
  year   = {},
  url    = {}
}

% TODO: add publication for Memorial dataset/model
@misc{memorial,
  title  = {Memorial: {PLACEHOLDER}},
  author = {},
  year   = {},
  url    = {}
}

@misc{rabus2026polyscriptor,
  title  = {Polyscriptor: Multi-Engine HTR Training \& Comparison Tool},
  author = {Rabus, Achim},
  year   = {2026},
  url    = {https://github.com/achimrabus/polyscriptor}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

achimrabus
/

crnn-ctc-ukrainian