manchu-ocr-crnn-final

CRNN baseline for Manchu script OCR (Manchu graph only — does not produce romanization). Step-2 model: continues from mic7ch/manchu-ocr-crnn-step1-syn's real_val-peak checkpoint, fine-tuned on real data only.

Architecture: ResNet-style CNN backbone → adaptive pool to fixed height → 4-layer BiLSTM (hidden=256) → CTC head. Training data: Two-stage: pre-trained on 60k synthetic, then fine-tuned on 20k real images.

This is the real_val-peak checkpoint by manchu_word_accuracy on a held-out 1000-sample real validation split — the same selection rule used for the VLM models in the paper.

Best checkpoint

Step: checkpoint-277500.pth (uploaded as best_model.pth)

Evaluation metrics

Word accuracy and character error rate at the selected step:

Split	manchu_word_accuracy	manchu_cer
synthetic-val (1000)	99.80%	0.015%
real-val (1000, held-out)	80.20%	5.966%
real-test (753)	61.09%	14.514%

(Roman transliteration is N/A — CRNN was trained on Manchu graph only.)

Training recipe

Architecture: CRNN with ResNet-style backbone + 4-layer BiLSTM + CTC
Hidden size: 256
Input: 480×64 grayscale
Optimizer: AdamW, lr=1e-3, weight_decay=0.05, betas=(0.9, 0.999)
Scheduler: CosineAnnealingWarmRestarts (T_0=10, T_mult=2)
Batch size: 16
Epochs: 100
Mixed precision: enabled
Gradient clipping: max_norm=1.0
Selection metric: manchu_word_accuracy on held-out real_val

Hyperparameters reproduce the prior baseline in mic7ch/manchu-ocr-crnn-base-3m (itself based on https://github.com/mic7ch1/ManchuAI-OCR), differing only in training data composition.

Usage

import torch
from huggingface_hub import hf_hub_download
# Requires the CRNN code at https://github.com/<your-fork>/hongtaiji_parallel
# (or use the standalone bundle in `crnn_standalone/`).
from src.CRNN.inference import CRNNInference

ckpt_path = hf_hub_download(repo_id="mic7ch/manchu-ocr-crnn-final", filename="best_model.pth")
ocr = CRNNInference(ckpt_path)
ocr.load_model()
text = ocr.predict("path/to/image.png")
print(text)

The .pth file is self-contained: it stores the model state_dict alongside char2idx, idx2char, and architectural hyperparameters (hidden_size, etc.), so no separate config is required for inference.

Citation

Paper forthcoming. Please cite the repository meanwhile:

@software{manchu_ocr_2026,
  author = {Chung, H.-M. and collaborators},
  title  = {Vision-language-model OCR for Manchu script},
  year   = {2026},
  url    = {https://huggingface.co/mic7ch/manchu-ocr-crnn-final}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

mic7ch
/

manchu-ocr-crnn-final