Per-strip ink model for offline-translator: given a dewarped text-detection box (RGB strip, 48px tall), it predicts two aligned channels at full resolution:

ink coverage of the glyphs
stroke-width estimate

InkUNet, base=16, levels=4, ~1.94M params. Trained purely on synthetic strips (degraded renders over procedural backgrounds) across a broad multi-weight font set.

Load

import torch
from model import InkUNet

st = torch.load("ink_b16_l4_8k.pt", map_location="cpu")
m = InkUNet(base=st["base"], levels=st["levels"],
            bold_from=st["bold_from"], bold_head=st["bold_head"])
m.load_state_dict(st["model"]); m.eval()

# x: (N, 3, 48, W) RGB in 0..1.  out: (N, 2, 48, W) logits.
out = torch.sigmoid(m(x))
matte, bold = out[:, 0], out[:, 1]            # both 0..1, full resolution
ink = matte > 0.5
line_is_bold = bold[ink].mean() > 0.55        # pool over the line's ink, then threshold

Files

ink_b16_l4_8k.pt — fp32 checkpoint (load with model.py).
ink_int8.mnn — deployable int8 (weight-quantised) MNN, dynamic width, height 48.
model.py — architecture.
convert_ink_mnn.py — checkpoint → ONNX → MNN (--int8 / --fp16).

Training code

scripts/ink_model/ in translator-rs.

Downloads last month: -; Downloads are not tracked for this model. How to track