screenpipe-pii-image-redactor

A screenpipe project. The image-modality companion to screenpipe/pii-redactor.

A fine-tuned image PII detector for the same three surfaces an AI agent sees a user's machine through:

Screen captures — JPGs / PNGs of the user's screen, rendered text and structured chrome (Slack, Outlook, Cursor, Terminal, Confluence, GitHub, 1Password, calendars, browsers).
Computer-use traces — the visual frames an agentic model (Claude Computer Use, GPT operator, etc.) reads when it controls a desktop.
Accessibility-tree visualizations — when an agent screenshots what it inferred from the AX tree to debug a tool call.

These surfaces are dense, multi-PII, semi-structured in ways no prose-trained PII detector handles well. Returns pixel-space bounding boxes for 12 canonical PII categories.

ONNX, ~108 MB. Same .onnx ships across macOS / Windows / Linux — the user's ONNX Runtime selects the Execution Provider at load time (CoreML, DirectML, CUDA, or CPU baseline).

License: CC BY-NC 4.0 (non-commercial). For commercial use — production redaction, SaaS / API embedding, AI-agent privacy middleware, custom fine-tunes — contact louis@screenpi.pe. See LICENSE.

Headline numbers

rfdetr_v8 on a held-out 221-image validation split (190 PII-bearing, 31 hard negatives) of the screenpipe-pii-bench-image corpus, IoU ≥ 0.30:

metric	this model	regex+OCR floor	Microsoft Presidio (published OSS)
zero-leak (every gold span caught)	95.3%	2.6%	0.5%
oversmash (false-fire on negatives)	0.0%	3.2%	48.4%
micro-precision	99%	87%	47%
micro-recall	97%	26%	42%
macro-F1	0.871	0.318	0.190

Per-label recall (a few highlights): private_person 0.99 · private_company 1.00 · private_repo 1.00 · private_url 1.00 · secret 0.99 · private_email 0.98 · private_phone 0.92 · private_address 0.92.

Latency (rfdetr_v8, 320×320 input, FP32)

platform	EP	p50
macOS Apple Silicon (M-series)	CoreML	66 ms (real-screen sample)
macOS Apple Silicon (M-series)	CPU	163 ms
Windows + DX12 GPU	DirectML	~30-60 ms (estimated)
Linux + NVIDIA	CUDA	~10-20 ms (estimated)
Linux/Windows CPU-only	CPU	~140 ms

Same .onnx everywhere — Execution Provider is selected at load time by the user's ONNX Runtime build. No CUDA / Vulkan / GPU vendor SDKs required at the consumer.

Why this exists (vs Presidio Image Redactor and friends)

The published baselines are trained on prose / generic-document imagery. A typical screenpipe frame looks nothing like that:

A Slack channel sidebar with 8 names, 12 channel mentions, 3 emails, and 1 pasted AWS key — all in 1440×900 px at 14 px font.
A 1Password vault entry with structured [Username | Password | Server | One-time password] rows, half of which are masked dots.
A Cursor workspace open on .env.production with five secret-shaped values stacked top-to-bottom.

These images are dense (10-20 PII spans per frame), structured (rows / columns / aligned chrome), and layout-cued (a thing in the "Username" cell is a username regardless of its surface text). A generic NER-on-OCR pipeline misfires by over-redacting UI chrome (48% false-fire on negatives in our bench, vs. 0% for this model).

If you're building an agentic system that reads screen state — a desktop-control agent, a memory layer for browsing, anything that streams screen captures into an LLM — this is the redactor designed for that pipe.

What it does

Per-image object detection. Given a JPG or PNG, returns [(bbox, label, score)] where each detection is a region the model thinks is PII, classified into one of the 12 canonical categories shared with screenpipe/pii-redactor:

private_person, private_email, private_phone, private_address,
private_url, private_company, private_repo, private_handle,
private_channel, private_id, private_date, secret

secret covers passwords, API keys, JWTs, DB connection strings, PRIVATE-KEY block markers, etc. — same coverage as the text model.

Inference

# pip install onnxruntime pillow numpy
import numpy as np
import onnxruntime as ort
from PIL import Image

CLASSES = [
    "private_person",   "private_email",   "private_phone",
    "private_address",  "private_url",     "private_company",
    "private_repo",     "private_handle",  "private_channel",
    "private_id",       "private_date",    "secret",
]
INPUT_SIZE = 320  # rfdetr_v8 was exported at 320x320
THRESHOLD  = 0.30

sess = ort.InferenceSession(
    "rfdetr_v8.onnx",
    providers=["CoreMLExecutionProvider", "CPUExecutionProvider"],
)

img = Image.open("screenshot.png").convert("RGB")
W, H = img.size
resized = img.resize((INPUT_SIZE, INPUT_SIZE), Image.BILINEAR)
arr = np.asarray(resized, dtype=np.float32) / 255.0
arr = (arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
arr = arr.transpose(2, 0, 1)[None].astype(np.float32)  # NCHW

boxes, logits = sess.run(None, {sess.get_inputs()[0].name: arr})
boxes  = boxes[0]    # (300, 4) cx, cy, w, h normalized
logits = logits[0]   # (300, 13) — last channel is "no-object"

probs = 1.0 / (1.0 + np.exp(-logits[:, :12]))   # per-class sigmoid
best_class = probs.argmax(axis=1)
best_score = probs[np.arange(300), best_class]
keep = best_score >= THRESHOLD

for q in np.where(keep)[0]:
    cx, cy, bw, bh = boxes[q]
    x1 = (cx - bw / 2) * W
    y1 = (cy - bh / 2) * H
    print(f"  {CLASSES[best_class[q]]:18} score={best_score[q]:.2f} "
          f"bbox=[{int(x1)}, {int(y1)}, {int(bw*W)}, {int(bh*H)}]")

Full example with image overlay → examples/inference.py.

For Rust integration via the ort crate, see the rust_smoke/ prototype and the production wiring in PR screenpipe/screenpipe#3188.

Redacting the image (vs. just detecting)

This model detects. To actually remove the PII, draw a solid rectangle over each detected bbox. Solid black, not blur — blur is reversible by super-resolution attacks; opaque rectangles aren't.

from PIL import ImageDraw
draw = ImageDraw.Draw(img)
for det in detections:        # from the snippet above
    x, y, w, h = det.bbox
    draw.rectangle([x, y, x + w, y + h], fill=(0, 0, 0))
img.save("screenshot_redacted.png")

That's the entire redactor wrapper. ~5 lines.

Architecture

Base: RF-DETR-Nano (Roboflow, ICLR 2026) — DINOv2-backbone real-time detection transformer, ~25 M params, claims first real-time model to break 60 mAP on COCO.
Fine-tuned at 320×320 input on a 2,833-image synthetic + WebPII union (synthetic via DOM-truth bbox extraction; WebPII via the arxiv 2603.17357 release).
Output head: 300 detection queries × 13 channels (12 PII classes + no-object). Per-class sigmoid (NOT softmax — RF-DETR uses independent classification per query).
Trained on a single A100 80 GB; ~100 minutes wall-clock for the best-EMA epoch.

What was the training data

source	size	labels	notes
synthetic bench	2,206 imgs	DOM-truth bboxes (pixel-perfect)	9 templates rendered via headless Chromium with `data-span` attributes — labels come from the same DOM tree the browser laid out.
WebPII	500 imgs (balanced sample)	bbox-labeled by the original authors	March 2026 release, e-commerce screenshots. Class-imbalance capped at 2× our synthetic frequency.
cascade auto-labels	100 imgs	OCR + text-PII model alignment	Old screenshots from this project's own bench, weakly labeled.

No real user data was used during fine-tuning. Membership inference attacks recover no real-user content because no real-user content was in the training set. If you discover a failure mode on your real screens, the project's recipe is to add a new SYNTHETIC template that reproduces it — the screenshot becomes a bug report, never a training row.

Limitations

Hand-curated gold set is small — bench data/ has 5 manually-built cases. Larger-scale held-out evaluation depends on the synthetic corpus, which is in-distribution by construction.
private_handle and private_id recall are 0% in the reference numbers because the val split has only 2 and 1 examples respectively. Don't deploy without a domain-specific eval pass.
Synthetic-template ceiling. 95.3% zero-leak is the bench's stable ceiling at this corpus size. Gains beyond come from training on more real-screen failure modes (tracked in the bench's backlog).
WebPII is e-commerce-heavy. Adding the full WebPII split actually hurt dev-app accuracy in our experiments (rfdetr_v4 at 90.5% zero-leak vs. v8's 95.3%). The 500-image balanced sample is our best-of-both compromise.
CPU-only floors at ~140 ms p50. INT8 quantization (planned) gets that under 100 ms, but the FP32 release is what's on this page today.
English-only. Synthetic templates render Latin-script text; the WebPII supplement is English. CJK / Arabic / Cyrillic not evaluated — don't deploy without a locale-specific eval.
Adversarial robustness not tested. A user who knows the detector exists could craft layouts that confuse it (handwritten PII, embedded-image PII, partial occlusion). Use this for honest-user privacy, not as a security boundary.

Files

rfdetr_v8.onnx                108 MB · the model · sha256 below
README.md                      this file
LICENSE                        CC BY-NC 4.0
NOTICE                         attribution to base model + datasets
examples/
  inference.py                 the snippet above, runnable

SHA-256 of rfdetr_v8.onnx: 431acc0f0beb22a39572b7a50af4fc446e799840fb71320dc124fbd79a121eb3

Reproducing inference

git clone https://huggingface.co/screenpipe/pii-image-redactor
cd pii-image-redactor
git lfs pull
pip install onnxruntime pillow numpy
python examples/inference.py path/to/your_screenshot.png

Reproducing the eval scores requires the screenpipe-pii-bench-image benchmark, which is not redistributed (it's the training corpus). Contact louis@screenpi.pe for benchmark access or commercial licensing.

License

CC BY-NC 4.0 — non-commercial use only. The base model (RF-DETR) is Apache-2.0; obligations are preserved (see NOTICE).

For commercial licensing (production deployment, redistribution rights, SaaS / API embedding, custom fine-tunes for your domain): louis@screenpi.pe.

Citation

@misc{screenpipe-pii-image-redactor-2026,
  title  = {screenpipe-pii-image-redactor: a screen-PII detector for
            accessibility-aware agents},
  author = {{screenpipe}},
  year   = {2026},
  url    = {https://huggingface.co/screenpipe/pii-image-redactor}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for screenpipe/pii-image-redactor

WebPII: Benchmarking Visual PII Detection for Computer-Use Agents

Paper • 2603.17357 • Published Mar 18