manchu-ocr-crnn-final
CRNN baseline for Manchu script OCR (Manchu graph only — does not produce romanization).
Step-2 model: continues from mic7ch/manchu-ocr-crnn-step1-syn's real_val-peak checkpoint, fine-tuned on real data only.
Architecture: ResNet-style CNN backbone → adaptive pool to fixed height → 4-layer BiLSTM (hidden=256) → CTC head. Training data: Two-stage: pre-trained on 60k synthetic, then fine-tuned on 20k real images.
This is the real_val-peak checkpoint by manchu_word_accuracy on a held-out 1000-sample real validation split — the same selection rule used for the VLM models in the paper.
Best checkpoint
- Step:
checkpoint-277500.pth(uploaded asbest_model.pth)
Evaluation metrics
Word accuracy and character error rate at the selected step:
| Split | manchu_word_accuracy | manchu_cer |
|---|---|---|
| synthetic-val (1000) | 99.80% | 0.015% |
| real-val (1000, held-out) | 80.20% | 5.966% |
| real-test (753) | 61.09% | 14.514% |
(Roman transliteration is N/A — CRNN was trained on Manchu graph only.)
Training recipe
- Architecture: CRNN with ResNet-style backbone + 4-layer BiLSTM + CTC
- Hidden size: 256
- Input: 480×64 grayscale
- Optimizer: AdamW, lr=1e-3, weight_decay=0.05, betas=(0.9, 0.999)
- Scheduler: CosineAnnealingWarmRestarts (T_0=10, T_mult=2)
- Batch size: 16
- Epochs: 100
- Mixed precision: enabled
- Gradient clipping: max_norm=1.0
- Selection metric:
manchu_word_accuracyon held-outreal_val
Hyperparameters reproduce the prior baseline in
mic7ch/manchu-ocr-crnn-base-3m
(itself based on https://github.com/mic7ch1/ManchuAI-OCR), differing only in training data composition.
Usage
import torch
from huggingface_hub import hf_hub_download
# Requires the CRNN code at https://github.com/<your-fork>/hongtaiji_parallel
# (or use the standalone bundle in `crnn_standalone/`).
from src.CRNN.inference import CRNNInference
ckpt_path = hf_hub_download(repo_id="mic7ch/manchu-ocr-crnn-final", filename="best_model.pth")
ocr = CRNNInference(ckpt_path)
ocr.load_model()
text = ocr.predict("path/to/image.png")
print(text)
The .pth file is self-contained: it stores the model state_dict alongside char2idx, idx2char, and architectural hyperparameters (hidden_size, etc.), so no separate config is required for inference.
Citation
Paper forthcoming. Please cite the repository meanwhile:
@software{manchu_ocr_2026,
author = {Chung, H.-M. and collaborators},
title = {Vision-language-model OCR for Manchu script},
year = {2026},
url = {https://huggingface.co/mic7ch/manchu-ocr-crnn-final}
}