screenpipe-pii-redactor

A screenpipe project.

screenpipe's own PII redactor, purpose-built for the three surfaces an AI agent actually sees a user's machine through:

Accessibility-tree dumps — the structured AX hierarchy macOS / Windows expose to assistive tech. Short, structured, full of labels like AXButton[Send to marcus@helios-ai.io].
OCR'd screen text — what tools like screenpipe extract from screen recordings: window-title-shaped artifacts, app chrome, and the occasional long-form email or doc.
Computer-use traces — what an agentic model (Claude Computer Use, GPT operators, etc.) reads when it drives a desktop.

These surfaces are short, sparse-context, and full of identifiers that slip past redactors trained on chat-style prose. This is a compact, multilingual token classifier trained in-house specifically for them. It is not OpenAI's Privacy Filter, and it is not a fine-tune of one — it's screenpipe's own model. 278 MB INT8 ONNX, ~9 ms p50 on CPU, runs fully offline.

License: CC BY-NC 4.0 (non-commercial). For commercial use — production redaction, SaaS / API embedding, AI-agent privacy middleware, custom fine-tunes — contact louis@screenpi.pe. See LICENSE.

Headline numbers

On ScreenLeak, our open benchmark for PII redaction on screen telemetry — n=422 hand-labelled desktop-telemetry strings, 13 categories, strict per-string zero-leak (every PII span in the string must be caught):

Model	Zero-leak
Gemini 3.1 Pro	91.0%	cloud API
GPT-5.5	90.7%	cloud API
Claude Opus 4.7	87.8%	cloud API
this model ⭐	86.7%	local · 278 MB · ~9 ms CPU · $0/call
Google Cloud DLP	37.7%	cloud API
Microsoft Presidio	35.4%	local OSS

Within a few points of the frontier APIs, ~50 points above the flagship commercial PII products — at zero per-call cost, fully offline. Full methodology, confidence intervals, and per-framework (HIPAA / GDPR / PCI DSS / …) breakdowns: github.com/screenpipe/screenleak. Try it in your browser: screenpipe.github.io/screenleak/demo.

What it does

Span-level redaction. Given a string, returns the regions it thinks are PII, each classified into one of 12 canonical categories:

private_person, private_email, private_phone, private_address,
private_url, private_company, private_repo, private_handle,
private_channel, private_id, private_date, secret

secret covers passwords, API keys, JWTs, DB connection strings, PRIVATE-KEY block markers, etc.

Tip: screenpipe redacts each captured fragment independently (one AX node / OCR line / window title at a time) — that's the distribution the model is tuned for. Feeding a giant multi-entity blob in a single call degrades recall; split on natural boundaries first.

Inference

Browser (transformers.js):

import { pipeline } from "@huggingface/transformers";
const pii = await pipeline("token-classification", "screenpipe/pii-redactor", { dtype: "q8" });
const out = await pii("export OPENAI_API_KEY=sk-proj-Ab12Cd34Ef56Gh78");
// out: per-token tags; group consecutive B-/I- tags into spans

Python (transformers):

# pip install transformers torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tok   = AutoTokenizer.from_pretrained("screenpipe/pii-redactor")
model = AutoModelForTokenClassification.from_pretrained("screenpipe/pii-redactor").eval()
id2label = model.config.id2label

def redact(text):
    enc = tok(text, return_offsets_mapping=True, return_tensors="pt", truncation=True)
    offsets = enc.pop("offset_mapping")[0].tolist()
    with torch.no_grad():
        pred = model(**enc).logits.argmax(-1)[0].tolist()
    # decode BIO from offsets + argmax (aggregation_strategy is unreliable
    # for this tokenizer — walk the offsets yourself)
    spans, cur = [], None
    for (s, e), p in zip(offsets, pred):
        if s == e:                       # special token
            cur = None; continue
        lab = id2label[p]
        if lab == "O":
            cur = None; continue
        base = lab.split("-", 1)[-1]
        if cur and cur["label"] == base and not lab.startswith("B-"):
            cur["end"] = e
        else:
            cur = {"start": s, "end": e, "label": base}; spans.append(cur)
    return [(d["start"], d["end"], d["label"], text[d["start"]:d["end"]]) for d in spans]

print(redact("export OPENAI_API_KEY=sk-proj-Ab12Cd34Ef56Gh78"))
# -> [(..., ..., 'secret', 'sk-proj-Ab12Cd34Ef56Gh78')]

Production INT8 ONNX weights are in onnx/ — load with onnxruntime on any platform (CoreML / DirectML / CUDA / CPU baseline); the same file ships everywhere.

Multilingual

Handles 6 languages (en, fr, de, it, es, nl). English is strongest; Dutch is the weakest and is flagged as a known gap. Validate on your locale before deploying.

Limitations

Sudo / login password prompts can leak. [sudo] password for alice: hunter2 may redact the username but survive the password. Pair with an OS-level keystroke-suppression policy.
Multi-entity blobs degrade recall — redact per captured fragment (see tip above), not one giant concatenated string.
Synthetic training data only. No real user data was used. Validate on YOUR data before deploying.
Over-redaction (oversmash). The model errs toward redacting — good for privacy-first deployments; flag it if you need clean text downstream.
Strict zero-leak metric. Absolute numbers depend on the evaluator's taxonomy and metric; macro-F1 is a more lenient lens.

License

CC BY-NC 4.0 — non-commercial use only. See NOTICE for third-party component attributions.

For commercial licensing (production deployment, redistribution, SaaS / API embedding, custom fine-tunes for your domain): louis@screenpi.pe.

Citation

@misc{screenpipe-pii-redactor-2026,
  title  = {screenpipe-pii-redactor: a PII redactor for accessibility
            trees, OCR'd screen text, and computer-use traces},
  author = {{screenpipe}},
  year   = {2026},
  url    = {https://huggingface.co/screenpipe/pii-redactor}
}

Downloads last month: 174