Per-strip ink model for offline-translator: given a dewarped text-detection box (RGB strip, 48px tall), it predicts two aligned channels at full resolution:

  • ink coverage of the glyphs
  • stroke-width estimate

InkUNet, base=16, levels=4, ~1.94M params. Trained purely on synthetic strips (degraded renders over procedural backgrounds) across a broad multi-weight font set.

Load

import torch
from model import InkUNet

st = torch.load("ink_b16_l4_8k.pt", map_location="cpu")
m = InkUNet(base=st["base"], levels=st["levels"],
            bold_from=st["bold_from"], bold_head=st["bold_head"])
m.load_state_dict(st["model"]); m.eval()

# x: (N, 3, 48, W) RGB in 0..1.  out: (N, 2, 48, W) logits.
out = torch.sigmoid(m(x))
matte, bold = out[:, 0], out[:, 1]            # both 0..1, full resolution
ink = matte > 0.5
line_is_bold = bold[ink].mean() > 0.55        # pool over the line's ink, then threshold

Files

  • ink_b16_l4_8k.pt โ€” fp32 checkpoint (load with model.py).
  • ink_int8.mnn โ€” deployable int8 (weight-quantised) MNN, dynamic width, height 48.
  • model.py โ€” architecture.
  • convert_ink_mnn.py โ€” checkpoint โ†’ ONNX โ†’ MNN (--int8 / --fp16).

Training code

scripts/ink_model/ in translator-rs.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support