Instructions to use screenpipe/pii-redactor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use screenpipe/pii-redactor with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="screenpipe/pii-redactor")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("screenpipe/pii-redactor") model = AutoModelForTokenClassification.from_pretrained("screenpipe/pii-redactor") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("screenpipe/pii-redactor")
model = AutoModelForTokenClassification.from_pretrained("screenpipe/pii-redactor")screenpipe-pii-redactor
A screenpipe project.
screenpipe's own PII redactor, purpose-built for the three surfaces an AI agent actually sees a user's machine through:
- Accessibility-tree dumps — the structured AX hierarchy macOS /
Windows expose to assistive tech. Short, structured, full of labels
like
AXButton[Send to marcus@helios-ai.io]. - OCR'd screen text — what tools like screenpipe extract from screen recordings: window-title-shaped artifacts, app chrome, and the occasional long-form email or doc.
- Computer-use traces — what an agentic model (Claude Computer Use, GPT operators, etc.) reads when it drives a desktop.
These surfaces are short, sparse-context, and full of identifiers that slip past redactors trained on chat-style prose. This is a compact, multilingual token classifier trained in-house specifically for them. It is not OpenAI's Privacy Filter, and it is not a fine-tune of one — it's screenpipe's own model. 278 MB INT8 ONNX, ~9 ms p50 on CPU, runs fully offline.
License: CC BY-NC 4.0 (non-commercial). For commercial use — production redaction, SaaS / API embedding, AI-agent privacy middleware, custom fine-tunes — contact louis@screenpi.pe. See
LICENSE.
Headline numbers
On ScreenLeak, our open benchmark for PII redaction on screen telemetry — n=422 hand-labelled desktop-telemetry strings, 13 categories, strict per-string zero-leak (every PII span in the string must be caught):
| Model | Zero-leak | |
|---|---|---|
| Gemini 3.1 Pro | 91.0% | cloud API |
| GPT-5.5 | 90.7% | cloud API |
| Claude Opus 4.7 | 87.8% | cloud API |
| this model ⭐ | 86.7% | local · 278 MB · ~9 ms CPU · $0/call |
| Google Cloud DLP | 37.7% | cloud API |
| Microsoft Presidio | 35.4% | local OSS |
Within a few points of the frontier APIs, ~50 points above the flagship commercial PII products — at zero per-call cost, fully offline. Full methodology, confidence intervals, and per-framework (HIPAA / GDPR / PCI DSS / …) breakdowns: github.com/screenpipe/screenleak. Try it in your browser: screenpipe.github.io/screenleak/demo.
What it does
Span-level redaction. Given a string, returns the regions it thinks are PII, each classified into one of 12 canonical categories:
private_person, private_email, private_phone, private_address,
private_url, private_company, private_repo, private_handle,
private_channel, private_id, private_date, secret
secret covers passwords, API keys, JWTs, DB connection strings,
PRIVATE-KEY block markers, etc.
Tip: screenpipe redacts each captured fragment independently (one AX node / OCR line / window title at a time) — that's the distribution the model is tuned for. Feeding a giant multi-entity blob in a single call degrades recall; split on natural boundaries first.
Inference
Browser (transformers.js):
import { pipeline } from "@huggingface/transformers";
const pii = await pipeline("token-classification", "screenpipe/pii-redactor", { dtype: "q8" });
const out = await pii("export OPENAI_API_KEY=sk-proj-Ab12Cd34Ef56Gh78");
// out: per-token tags; group consecutive B-/I- tags into spans
Python (transformers):
# pip install transformers torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tok = AutoTokenizer.from_pretrained("screenpipe/pii-redactor")
model = AutoModelForTokenClassification.from_pretrained("screenpipe/pii-redactor").eval()
id2label = model.config.id2label
def redact(text):
enc = tok(text, return_offsets_mapping=True, return_tensors="pt", truncation=True)
offsets = enc.pop("offset_mapping")[0].tolist()
with torch.no_grad():
pred = model(**enc).logits.argmax(-1)[0].tolist()
# decode BIO from offsets + argmax (aggregation_strategy is unreliable
# for this tokenizer — walk the offsets yourself)
spans, cur = [], None
for (s, e), p in zip(offsets, pred):
if s == e: # special token
cur = None; continue
lab = id2label[p]
if lab == "O":
cur = None; continue
base = lab.split("-", 1)[-1]
if cur and cur["label"] == base and not lab.startswith("B-"):
cur["end"] = e
else:
cur = {"start": s, "end": e, "label": base}; spans.append(cur)
return [(d["start"], d["end"], d["label"], text[d["start"]:d["end"]]) for d in spans]
print(redact("export OPENAI_API_KEY=sk-proj-Ab12Cd34Ef56Gh78"))
# -> [(..., ..., 'secret', 'sk-proj-Ab12Cd34Ef56Gh78')]
Production INT8 ONNX weights are in onnx/ — load with
onnxruntime on any platform (CoreML / DirectML / CUDA / CPU baseline);
the same file ships everywhere.
Multilingual
Handles 6 languages (en, fr, de, it, es, nl). English is strongest; Dutch is the weakest and is flagged as a known gap. Validate on your locale before deploying.
Limitations
- Sudo / login password prompts can leak.
[sudo] password for alice: hunter2may redact the username but survive the password. Pair with an OS-level keystroke-suppression policy. - Multi-entity blobs degrade recall — redact per captured fragment (see tip above), not one giant concatenated string.
- Synthetic training data only. No real user data was used. Validate on YOUR data before deploying.
- Over-redaction (oversmash). The model errs toward redacting — good for privacy-first deployments; flag it if you need clean text downstream.
- Strict zero-leak metric. Absolute numbers depend on the evaluator's taxonomy and metric; macro-F1 is a more lenient lens.
License
CC BY-NC 4.0 — non-commercial use only. See NOTICE
for third-party component attributions.
For commercial licensing (production deployment, redistribution, SaaS / API embedding, custom fine-tunes for your domain): louis@screenpi.pe.
Citation
@misc{screenpipe-pii-redactor-2026,
title = {screenpipe-pii-redactor: a PII redactor for accessibility
trees, OCR'd screen text, and computer-use traces},
author = {{screenpipe}},
year = {2026},
url = {https://huggingface.co/screenpipe/pii-redactor}
}
- Downloads last month
- 55
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="screenpipe/pii-redactor")