Boos4721
/

cpdaily-ocr

Model card Files Files and versions

cpdaily-ocr / README.md

Boos4721's picture

Upload folder using huggingface_hub

c132e85 verified 12 days ago

|

History Blame Contribute Delete

3.55 kB

	---
	license: mit
	language:
	- en
	- zh
	tags:
	- ocr
	- captcha
	- crnn
	- ctc
	- onnx
	library_name: onnx
	pipeline_tag: image-to-text
	---

	# cpdaily-ocr

	A lightweight CRNN+CTC OCR model for recognizing 5-character alphanumeric captchas
	(随机彩色斜体、旋转, 白底无干扰线 / random colored italic & rotated glyphs on a clean
	white background). Trained from scratch on real captcha samples, exported to pure ONNX
	for dependency-free inference (works with `tract`, `onnxruntime`, etc.).

	一个轻量级 CRNN+CTC 验证码识别模型, 识别 5 位字母数字验证码。从真实样本自训, 导出为
	纯 ONNX, 推理无任何 Python 依赖。

	## Files / 文件

	\| File \| Size \| Description \|
	\|---\|---\|---\|
	\| `cpdaily_captcha_ocr.onnx` \| 2.24 MB \| fp32 full-precision master / fp32 全精度母本 \|
	\| `cpdaily_captcha_ocr_fp16.onnx` \| 1.07 MB \| fp16-stored, fp32-compute (lossless, recommended) / fp16 存储 fp32 计算, 无损, 推荐部署 \|
	\| `charset.json` \| — \| Character table, index 0 = CTC blank / 字符表, index 0 为 CTC blank \|
	\| `config.json` \| — \| Input size, preprocessing, decode info / 输入尺寸、预处理、解码信息 \|

	> The fp16 file stores weights as fp16 with `Cast(fp16→fp32)` nodes; inference engines
	> constant-fold them at optimization time, so computation stays fp32 (no accuracy loss)
	> while the file is half the size. This avoids engines that don't support fp16 compute ops
	> (GRU/Conv). Standard fp16 conversion and int8 quantization were tested and fail to load
	> in `tract` — this fp16-cast format is the compatible compression path.
	>
	> fp16 版以 fp16 存权重 + `Cast` 节点, 推理时常量折叠回 fp32 计算(精度无损), 体积砍半,
	> 且规避了部分引擎不支持 fp16/量化算子的限制。

	## Specs / 规格

	\| \| \|
	\|---\|---\|
	\| Architecture \| Depthwise-separable CNN + 2-layer BiGRU + FC, CTC decode \|
	\| Charset \| 62 classes: `A-Z` + `a-z` + `0-9` (+1 CTC blank = 63) \|
	\| Input \| grayscale, resized to `32 × 160`, normalized to `[0,1]` \|
	\| Output \| `[T, 63]` log-softmax, CTC greedy decode \|
	\| Accuracy \| 99.37% full-string (99.7% char-level) on a hand-verified validation set \|
	\| Size \| fp32 2.24 MB / fp16 1.07 MB (lossless compression) \|

	## Usage (onnxruntime) / 用法

	```python
	import json, numpy as np, onnxruntime as ort
	from PIL import Image

	chars = json.load(open("charset.json")) # ["<blank>", "A", "B", ...]
	sess = ort.InferenceSession("cpdaily_captcha_ocr_fp16.onnx",
	providers=["CPUExecutionProvider"])
	inp = sess.get_inputs()[0].name

	def recognize(path):
	img = Image.open(path).convert("L").resize((160, 32), Image.BILINEAR)
	x = (np.asarray(img, dtype=np.float32) / 255.0)[None, None, :, :]
	logits = sess.run(None, {inp: x})[0][0] # [T, 63]
	idx = logits.argmax(-1)
	out, prev = [], -1
	for p in idx: # CTC greedy: dedup + drop blank
	if p != prev and p != 0:
	out.append(chars[p])
	prev = p
	return "".join(out)

	print(recognize("captcha.png"))
	```

	## Usage (Rust / tract) / 用法

	```rust
	use tract_onnx::prelude::*;

	let model = tract_onnx::onnx()
	.model_for_path("cpdaily_captcha_ocr_fp16.onnx")?
	.with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 1, 32, 160)))?
	.into_optimized()?
	.into_runnable()?;
	// preprocess to [1,1,32,160] f32 in [0,1], run, then CTC-greedy decode the [T,63] output.
	```

	## License

	MIT. Trained from scratch on self-collected data.