cpdaily-ocr

A lightweight CRNN+CTC OCR model for recognizing 5-character alphanumeric captchas (随机彩色斜体、旋转, 白底无干扰线 / random colored italic & rotated glyphs on a clean white background). Trained from scratch on real captcha samples, exported to pure ONNX for dependency-free inference (works with tract, onnxruntime, etc.).

一个轻量级 CRNN+CTC 验证码识别模型, 识别 5 位字母数字验证码。从真实样本自训, 导出为 纯 ONNX, 推理无任何 Python 依赖。

Files / 文件

File Size Description
cpdaily_captcha_ocr.onnx 2.24 MB fp32 full-precision master / fp32 全精度母本
cpdaily_captcha_ocr_fp16.onnx 1.07 MB fp16-stored, fp32-compute (lossless, recommended) / fp16 存储 fp32 计算, 无损, 推荐部署
charset.json Character table, index 0 = CTC blank / 字符表, index 0 为 CTC blank
config.json Input size, preprocessing, decode info / 输入尺寸、预处理、解码信息

The fp16 file stores weights as fp16 with Cast(fp16→fp32) nodes; inference engines constant-fold them at optimization time, so computation stays fp32 (no accuracy loss) while the file is half the size. This avoids engines that don't support fp16 compute ops (GRU/Conv). Standard fp16 conversion and int8 quantization were tested and fail to load in tract — this fp16-cast format is the compatible compression path.

fp16 版以 fp16 存权重 + Cast 节点, 推理时常量折叠回 fp32 计算(精度无损), 体积砍半, 且规避了部分引擎不支持 fp16/量化算子的限制。

Specs / 规格

Architecture Depthwise-separable CNN + 2-layer BiGRU + FC, CTC decode
Charset 62 classes: A-Z + a-z + 0-9 (+1 CTC blank = 63)
Input grayscale, resized to 32 × 160, normalized to [0,1]
Output [T, 63] log-softmax, CTC greedy decode
Accuracy 99.37% full-string (99.7% char-level) on a hand-verified validation set
Size fp32 2.24 MB / fp16 1.07 MB (lossless compression)

Usage (onnxruntime) / 用法

import json, numpy as np, onnxruntime as ort
from PIL import Image

chars = json.load(open("charset.json"))          # ["<blank>", "A", "B", ...]
sess = ort.InferenceSession("cpdaily_captcha_ocr_fp16.onnx",
                            providers=["CPUExecutionProvider"])
inp = sess.get_inputs()[0].name

def recognize(path):
    img = Image.open(path).convert("L").resize((160, 32), Image.BILINEAR)
    x = (np.asarray(img, dtype=np.float32) / 255.0)[None, None, :, :]
    logits = sess.run(None, {inp: x})[0][0]       # [T, 63]
    idx = logits.argmax(-1)
    out, prev = [], -1
    for p in idx:                                  # CTC greedy: dedup + drop blank
        if p != prev and p != 0:
            out.append(chars[p])
        prev = p
    return "".join(out)

print(recognize("captcha.png"))

Usage (Rust / tract) / 用法

use tract_onnx::prelude::*;

let model = tract_onnx::onnx()
    .model_for_path("cpdaily_captcha_ocr_fp16.onnx")?
    .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 1, 32, 160)))?
    .into_optimized()?
    .into_runnable()?;
// preprocess to [1,1,32,160] f32 in [0,1], run, then CTC-greedy decode the [T,63] output.

License

MIT. Trained from scratch on self-collected data.

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Boos4721/cpdaily-ocr 1