Per-strip ink model for offline-translator: given a dewarped text-detection box (RGB strip, 48px tall), it predicts two aligned channels at full resolution:
- ink coverage of the glyphs
- stroke-width estimate
InkUNet, base=16, levels=4, ~1.94M params. Trained purely on synthetic strips (degraded
renders over procedural backgrounds) across a broad multi-weight font set.
Load
import torch
from model import InkUNet
st = torch.load("ink_b16_l4_8k.pt", map_location="cpu")
m = InkUNet(base=st["base"], levels=st["levels"],
bold_from=st["bold_from"], bold_head=st["bold_head"])
m.load_state_dict(st["model"]); m.eval()
# x: (N, 3, 48, W) RGB in 0..1. out: (N, 2, 48, W) logits.
out = torch.sigmoid(m(x))
matte, bold = out[:, 0], out[:, 1] # both 0..1, full resolution
ink = matte > 0.5
line_is_bold = bold[ink].mean() > 0.55 # pool over the line's ink, then threshold
Files
ink_b16_l4_8k.ptโ fp32 checkpoint (load withmodel.py).ink_int8.mnnโ deployable int8 (weight-quantised) MNN, dynamic width, height 48.model.pyโ architecture.convert_ink_mnn.pyโ checkpoint โ ONNX โ MNN (--int8/--fp16).
Training code
scripts/ink_model/ in translator-rs.