Valerii Sielikhov commited on
Commit ·
a4a11b4
1
Parent(s): 561759b
Update CITATION.cff with correct repository link; enhance README files with architecture details, training data, recognition examples, and evaluation metrics for the Ukrainian OCR/ICR model.
Browse files- CITATION.cff +1 -1
- README.md +44 -4
- README.uk.md +24 -0
- images/example_1.png +0 -0
- images/example_2.png +0 -0
CITATION.cff
CHANGED
|
@@ -6,7 +6,7 @@ license: Apache-2.0
|
|
| 6 |
authors:
|
| 7 |
- family-names: "Sielikhov"
|
| 8 |
given-names: "Volodymyr"
|
| 9 |
-
repository-code: "https://huggingface.co/
|
| 10 |
preferred-citation:
|
| 11 |
type: article
|
| 12 |
title: "HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition"
|
|
|
|
| 6 |
authors:
|
| 7 |
- family-names: "Sielikhov"
|
| 8 |
given-names: "Volodymyr"
|
| 9 |
+
repository-code: "https://huggingface.co/Valerii02/ukr-htr-convtext"
|
| 10 |
preferred-citation:
|
| 11 |
type: article
|
| 12 |
title: "HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition"
|
README.md
CHANGED
|
@@ -13,6 +13,10 @@ tags:
|
|
| 13 |
- pytorch
|
| 14 |
- onnx
|
| 15 |
pipeline_tag: image-to-text
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Ukrainian OCR / ICR (HTR-ConvText)
|
|
@@ -24,7 +28,16 @@ Ukrainian translation: [README.uk.md](README.uk.md).
|
|
| 24 |
|
| 25 |
This repository packages a Ukrainian OCR/ICR model for handwritten and partially printed text with a Hugging Face-native API (`AutoModel` + `AutoProcessor`).
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
- deterministic preprocessing in a custom processor,
|
| 29 |
- CTC greedy decoding in the processor,
|
| 30 |
- checkpoint conversion from original `.pth` format to HF artifacts,
|
|
@@ -66,6 +79,15 @@ text = processor.batch_decode(logits)[0]
|
|
| 66 |
print(text)
|
| 67 |
```
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
## Repository Files
|
| 70 |
|
| 71 |
- `configuration_htr.py`: HF config class.
|
|
@@ -119,11 +141,14 @@ Recommended acceptance thresholds:
|
|
| 119 |
- CER delta abs <= `0.005`
|
| 120 |
- WER delta abs <= `0.01`
|
| 121 |
|
| 122 |
-
## Evaluation
|
| 123 |
|
| 124 |
| Split | CER | WER | Notes |
|
| 125 |
|---|---:|---:|---|
|
| 126 |
-
| test |
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
## Limitations
|
| 129 |
|
|
@@ -149,4 +174,19 @@ Apache-2.0. See `LICENSE`.
|
|
| 149 |
|
| 150 |
## Citation
|
| 151 |
|
| 152 |
-
If you use this model, cite both upstream HTR-ConvText and this packaged release
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- pytorch
|
| 14 |
- onnx
|
| 15 |
pipeline_tag: image-to-text
|
| 16 |
+
base_model: DAIR-Group/HTR-ConvText
|
| 17 |
+
metrics:
|
| 18 |
+
- cer
|
| 19 |
+
- wer
|
| 20 |
---
|
| 21 |
|
| 22 |
# Ukrainian OCR / ICR (HTR-ConvText)
|
|
|
|
| 28 |
|
| 29 |
This repository packages a Ukrainian OCR/ICR model for handwritten and partially printed text with a Hugging Face-native API (`AutoModel` + `AutoProcessor`).
|
| 30 |
|
| 31 |
+
**Architecture:** HTR-ConvText — hybrid CNN + Vision Transformer with ResNet-18 + MobileViT (MVP) backbone, hierarchical ConvText encoder (U-Net-like down/upsampling), CTC decoding. Input: 64×3072 px (height × max width), 151-character vocabulary (Ukrainian + symbols).
|
| 32 |
+
|
| 33 |
+
**Training data:**
|
| 34 |
+
- [ukrainian-handwriting-synth](https://github.com/ValeriiSielikhov/ukrainian-handwriting-synth) — synthetic handwritten lines
|
| 35 |
+
- [Ukrainian Handwritten Text](https://www.kaggle.com/datasets/annyhnatiuk/ukrainian-handwritten-text) — ~37k segmented lines
|
| 36 |
+
- **Total:** 1,696,499 samples (Train 90% / Val 5% / Test 5%)
|
| 37 |
+
|
| 38 |
+
**Training:** 500k iterations, batch 16 + grad accum 4, SAM optimizer, EMA (decay 0.9999), TCM warmup 40k iters, scan simulation and detector-error augmentations. Hardware: NVIDIA B200 (180GB VRAM), Beyond.pl / Run:ai.
|
| 39 |
+
|
| 40 |
+
The packaged release includes:
|
| 41 |
- deterministic preprocessing in a custom processor,
|
| 42 |
- CTC greedy decoding in the processor,
|
| 43 |
- checkpoint conversion from original `.pth` format to HF artifacts,
|
|
|
|
| 79 |
print(text)
|
| 80 |
```
|
| 81 |
|
| 82 |
+
## Recognition Examples
|
| 83 |
+
|
| 84 |
+
| Example | Image | GT | Prediction | CER | WER |
|
| 85 |
+
|---------|-------|----|------------|-----|-----|
|
| 86 |
+
| 1 |  | Департаменту патрульної поліції | Департаменту нагрульної поліції | 0.065 | 0.33 |
|
| 87 |
+
| 2 |  | за порушення правил дорожнього руху | за порушення правил дорожнього Дуку | 0.057 | 0.20 |
|
| 88 |
+
|
| 89 |
+
*Real-world inference on scanned Ukrainian documents. GT = ground truth, CER/WER per sample.*
|
| 90 |
+
|
| 91 |
## Repository Files
|
| 92 |
|
| 93 |
- `configuration_htr.py`: HF config class.
|
|
|
|
| 141 |
- CER delta abs <= `0.005`
|
| 142 |
- WER delta abs <= `0.01`
|
| 143 |
|
| 144 |
+
## Evaluation
|
| 145 |
|
| 146 |
| Split | CER | WER | Notes |
|
| 147 |
|---|---:|---:|---|
|
| 148 |
+
| test (84,826) | — | — | Held-out from training split; add metrics after eval |
|
| 149 |
+
| real-world (124) | 0.176 | 0.440 | Scanned docs, handwritten + printed (`images/`) |
|
| 150 |
+
|
| 151 |
+
*Real-world metrics use micro-averaging and `format_string_for_wer` normalization. The 124-image set is from production-like scans; the held-out test split (84,826) can be evaluated with your pipeline.*
|
| 152 |
|
| 153 |
## Limitations
|
| 154 |
|
|
|
|
| 174 |
|
| 175 |
## Citation
|
| 176 |
|
| 177 |
+
If you use this model, cite both upstream HTR-ConvText and this packaged release.
|
| 178 |
+
|
| 179 |
+
**Upstream (HTR-ConvText):**
|
| 180 |
+
```bibtex
|
| 181 |
+
@misc{truc2025htrconvtext,
|
| 182 |
+
title={HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition},
|
| 183 |
+
author={Pham Thach Thanh Truc and Dang Hoai Nam and Huynh Tong Dang Khoa and Vo Nguyen Le Duy},
|
| 184 |
+
year={2025},
|
| 185 |
+
eprint={2512.05021},
|
| 186 |
+
archivePrefix={arXiv},
|
| 187 |
+
primaryClass={cs.CV},
|
| 188 |
+
url={https://arxiv.org/abs/2512.05021},
|
| 189 |
+
}
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
**This model:** See `CITATION.cff` for full attribution.
|
README.uk.md
CHANGED
|
@@ -6,6 +6,15 @@
|
|
| 6 |
|
| 7 |
Цей репозиторій пакує українську OCR/ICR модель для рукописного та частково друкованого тексту у форматі, зручному для Hugging Face (`AutoModel` + `AutoProcessor`).
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
Репозиторій включає:
|
| 10 |
- кастомний HF `Processor` з PIL-препроцесінгом,
|
| 11 |
- CTC greedy decode,
|
|
@@ -36,6 +45,15 @@ text = processor.batch_decode(logits)[0]
|
|
| 36 |
print(text)
|
| 37 |
```
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
## Конвертація чекпойнта в HF формат
|
| 40 |
|
| 41 |
```bash
|
|
@@ -71,6 +89,12 @@ python validate_parity.py \
|
|
| 71 |
- CER delta abs <= `0.005`
|
| 72 |
- WER delta abs <= `0.01`
|
| 73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
## Відомі обмеження
|
| 75 |
|
| 76 |
- Складні артефакти скану, сильний blur, дуже низький контраст знижують якість.
|
|
|
|
| 6 |
|
| 7 |
Цей репозиторій пакує українську OCR/ICR модель для рукописного та частково друкованого тексту у форматі, зручному для Hugging Face (`AutoModel` + `AutoProcessor`).
|
| 8 |
|
| 9 |
+
**Архітектура:** HTR-ConvText — гібрид CNN + Vision Transformer (ResNet-18 + MobileViT), CTC декодування. Вхід: 64×3072 px, 151 символ.
|
| 10 |
+
|
| 11 |
+
**Дані для тренування:**
|
| 12 |
+
- [ukrainian-handwriting-synth](https://github.com/ValeriiSielikhov/ukrainian-handwriting-synth) — синтетичний рукопис
|
| 13 |
+
- [Ukrainian Handwritten Text](https://www.kaggle.com/datasets/annyhnatiuk/ukrainian-handwritten-text) — ~37k рядків
|
| 14 |
+
- **Всього:** 1,696,499 зразків (Train 90% / Val 5% / Test 5%)
|
| 15 |
+
|
| 16 |
+
**Тренування:** 500k ітерацій, SAM, EMA, scan simulation, detector-error аугментації. NVIDIA B200 (180GB VRAM).
|
| 17 |
+
|
| 18 |
Репозиторій включає:
|
| 19 |
- кастомний HF `Processor` з PIL-препроцесінгом,
|
| 20 |
- CTC greedy decode,
|
|
|
|
| 45 |
print(text)
|
| 46 |
```
|
| 47 |
|
| 48 |
+
## Приклади розпізнавання
|
| 49 |
+
|
| 50 |
+
| Приклад | Зображення | GT | Prediction | CER | WER |
|
| 51 |
+
|---------|------------|----|------------|-----|-----|
|
| 52 |
+
| 1 |  | Департаменту патрульної поліції | Департаменту нагрульної поліції | 0.065 | 0.33 |
|
| 53 |
+
| 2 |  | за порушення правил дорожнього руху | за порушення правил дорожнього Дуку | 0.057 | 0.20 |
|
| 54 |
+
|
| 55 |
+
*Інференс на реальних сканах українських документів. GT = еталонний текст.*
|
| 56 |
+
|
| 57 |
## Конвертація чекпойнта в HF формат
|
| 58 |
|
| 59 |
```bash
|
|
|
|
| 89 |
- CER delta abs <= `0.005`
|
| 90 |
- WER delta abs <= `0.01`
|
| 91 |
|
| 92 |
+
## Оцінка (Evaluation)
|
| 93 |
+
|
| 94 |
+
| Split | CER | WER | Примітки |
|
| 95 |
+
|---|---:|---:|---|
|
| 96 |
+
| real-world (124) | 0.176 | 0.440 | Скани документів (рукопис + друк) |
|
| 97 |
+
|
| 98 |
## Відомі обмеження
|
| 99 |
|
| 100 |
- Складні артефакти скану, сильний blur, дуже низький контраст знижують якість.
|
images/example_1.png
ADDED
|
images/example_2.png
ADDED
|