| --- |
| license: mit |
| language: |
| - en |
| - zh |
| tags: |
| - ocr |
| - captcha |
| - crnn |
| - ctc |
| - onnx |
| library_name: onnx |
| pipeline_tag: image-to-text |
| --- |
| |
| # cpdaily-ocr |
|
|
| A lightweight CRNN+CTC OCR model for recognizing 5-character alphanumeric captchas |
| (随机彩色斜体、旋转, 白底无干扰线 / random colored italic & rotated glyphs on a clean |
| white background). Trained from scratch on real captcha samples, exported to pure ONNX |
| for dependency-free inference (works with `tract`, `onnxruntime`, etc.). |
|
|
| 一个轻量级 CRNN+CTC 验证码识别模型, 识别 5 位字母数字验证码。从真实样本自训, 导出为 |
| 纯 ONNX, 推理无任何 Python 依赖。 |
|
|
| ## Files / 文件 |
|
|
| | File | Size | Description | |
| |---|---|---| |
| | `cpdaily_captcha_ocr.onnx` | 2.24 MB | fp32 full-precision master / fp32 全精度母本 | |
| | `cpdaily_captcha_ocr_fp16.onnx` | 1.07 MB | fp16-stored, fp32-compute (lossless, recommended) / fp16 存储 fp32 计算, 无损, 推荐部署 | |
| | `charset.json` | — | Character table, index 0 = CTC blank / 字符表, index 0 为 CTC blank | |
| | `config.json` | — | Input size, preprocessing, decode info / 输入尺寸、预处理、解码信息 | |
|
|
| > The fp16 file stores weights as fp16 with `Cast(fp16→fp32)` nodes; inference engines |
| > constant-fold them at optimization time, so **computation stays fp32 (no accuracy loss)** |
| > while the file is half the size. This avoids engines that don't support fp16 compute ops |
| > (GRU/Conv). Standard fp16 conversion and int8 quantization were tested and fail to load |
| > in `tract` — this fp16-cast format is the compatible compression path. |
| > |
| > fp16 版以 fp16 存权重 + `Cast` 节点, 推理时常量折叠回 fp32 计算(精度无损), 体积砍半, |
| > 且规避了部分引擎不支持 fp16/量化算子的限制。 |
|
|
| ## Specs / 规格 |
|
|
| | | | |
| |---|---| |
| | Architecture | Depthwise-separable CNN + 2-layer BiGRU + FC, CTC decode | |
| | Charset | 62 classes: `A-Z` + `a-z` + `0-9` (+1 CTC blank = 63) | |
| | Input | grayscale, resized to `32 × 160`, normalized to `[0,1]` | |
| | Output | `[T, 63]` log-softmax, CTC greedy decode | |
| | Accuracy | 99.37% full-string (99.7% char-level) on a hand-verified validation set | |
| | Size | fp32 2.24 MB / fp16 1.07 MB (lossless compression) | |
|
|
| ## Usage (onnxruntime) / 用法 |
|
|
| ```python |
| import json, numpy as np, onnxruntime as ort |
| from PIL import Image |
| |
| chars = json.load(open("charset.json")) # ["<blank>", "A", "B", ...] |
| sess = ort.InferenceSession("cpdaily_captcha_ocr_fp16.onnx", |
| providers=["CPUExecutionProvider"]) |
| inp = sess.get_inputs()[0].name |
| |
| def recognize(path): |
| img = Image.open(path).convert("L").resize((160, 32), Image.BILINEAR) |
| x = (np.asarray(img, dtype=np.float32) / 255.0)[None, None, :, :] |
| logits = sess.run(None, {inp: x})[0][0] # [T, 63] |
| idx = logits.argmax(-1) |
| out, prev = [], -1 |
| for p in idx: # CTC greedy: dedup + drop blank |
| if p != prev and p != 0: |
| out.append(chars[p]) |
| prev = p |
| return "".join(out) |
| |
| print(recognize("captcha.png")) |
| ``` |
|
|
| ## Usage (Rust / tract) / 用法 |
|
|
| ```rust |
| use tract_onnx::prelude::*; |
| |
| let model = tract_onnx::onnx() |
| .model_for_path("cpdaily_captcha_ocr_fp16.onnx")? |
| .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 1, 32, 160)))? |
| .into_optimized()? |
| .into_runnable()?; |
| // preprocess to [1,1,32,160] f32 in [0,1], run, then CTC-greedy decode the [T,63] output. |
| ``` |
|
|
| ## License |
|
|
| MIT. Trained from scratch on self-collected data. |
|
|