Try openai/privacy-filter live on your own PDFs/DOCX (Maskify — privacy-first)

#20
by Lisarde - opened

Hi! We've built a small open web app called Maskify (https://maskify.es) that uses openai/privacy-filter as its main detector — sharing here in case anyone wants to try the model live on real documents without spinning it up locally.

The pipeline layers your model with extra checks:

  • openai/privacy-filter for the semantic NER pass.
  • A validated-regex layer for patterns the model wasn't trained on (IBANs mod-97, Luhn-checked cards, Spanish DNI/NIE, common API-key prefixes).
  • Optional per-user saved patterns.

It accepts PDF, DOCX, TXT and Markdown. PDF export is rasterised — the original text bytes are physically removed, not just visually covered. DOCX is redacted in-place, preserving styles, tables and images.

Privacy is the point:

  • The original PDF/DOCX never leaves the browser: it stays in IndexedDB while you edit, and only the extracted text is sent to the server for detection.
  • The reversal map (used to un-mask later) is generated locally and held only by the user.
  • No third-party services in the detection path.

Free to try, no signup needed for the basic flow. Feedback very welcome (https://maskify.es/es/contact) — especially on cases where the model misses or over-detects, since that's where the regex / user-pattern layer earns its keep.

One more note on usage: it's completely free, no paywall and no rate limit per account, but it's running on a small VPS — a single Ampere A1 with 4 cores and ~12 GB free for the model. Detections are CPU-bound and can take anywhere from a couple of seconds to ~30/100 seconds depending on document size (is limited to 10k chars free users, and 100k chars free registered users), so requests are processed one at a time.

If someone else is detecting when you submit, you'll be queued. The UI surfaces this clearly: the upload screen tells you how many requests are ahead of you, and the editor shows live status so you're never left wondering whether something's stuck. You can also opt into regex-only mode at upload time if you'd rather skip the model and get instant results (the high-precision validators still catch IBANs, cards, DNIs, emails, etc.).

So please be patient if it feels slow at peak times — it's a hobby deployment, not a paid service. Thanks for trying it!

Sign up or log in to comment