MIT 48px CTC OCR ONNX
This repository provides an ONNX conversion of the 48px CTC OCR model used by manga-image-translator.
The ONNX artifact is derived from the upstream PyTorch checkpoint
ocr-ctc.ckpt from the beta-0.3 release asset ocr-ctc.zip.
Files
mit48pxctc_ocr.onnx
alphabet-all-v5.txt
metadata.json
LICENSE
NOTICE
Source
- Upstream project: https://github.com/zyddnys/manga-image-translator
- Upstream release: https://github.com/zyddnys/manga-image-translator/releases/tag/beta-0.3
- Source archive:
ocr-ctc.zip - Source checkpoint:
ocr-ctc.ckpt - Source alphabet:
alphabet-all-v5.txt
Model Contract
Input:
- name:
image - dtype:
float32 - shape:
[batch, 3, 48, width] - color order: BGR
- normalization:
(uint8_pixel - 127.5) / 127.5
Outputs:
char_logits:[batch, time, vocab_size]color_values:[batch, time, 6]
char_logits is not softmaxed. color_values is not clamped. The first
dictionary entry is the CTC blank token. The special token <SP> represents a
normal space.
Validation
The ONNX export was checked with onnx.checker and compared against the
PyTorch checkpoint with ONNX Runtime CPU execution.
width=512: logits diff=0.000839233; colors diff=7.86781e-05
width=1024: logits diff=0.000980377; colors diff=6.19292e-05
width=1536: logits diff=0.000984192; colors diff=2.74777e-05
Export
The model was exported with:
uv run --extra export python scripts/export.py \
--checkpoint origin_model/ocr-ctc.ckpt \
--alphabet origin_model/alphabet-all-v5.txt \
--output dist/mit48pxctc_ocr.onnx
License
This ONNX conversion and the accompanying files are distributed under
GPL-3.0-only. See LICENSE.
The upstream project is GPL-3.0 licensed. Upstream authorship and copyright
remain with the original authors and contributors of manga-image-translator and
the model authors. See NOTICE for source attribution and redistribution
authorization details.